You can do Q-Learning with a transformer. You simply define the state space as t... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		highd 71 days ago \| parent \| context \| favorite \| on: Q-learning is not yet scalable You can do Q-Learning with a transformer. You simply define the state space as the observation sequence. This is in fact natural to do in partially observed settings. So your distinction does not make sense.

isaacimagine 71 days ago [–]

DT's reward-to-go vs. QL's Bellman incl. discount, not choice of architecture for policy. You could also do DTs with RNNs (though own problems w/ memory).

Apologies if we're talking past one another.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact