Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Care to elaborate how AZ/MZ are not model based? MuZero learns a transition function. AZ uses a known transition model to do planning, but I would still say that's model-based RL.


AZ uses a known transition model. I wouldn't call this model-based, because when people say model-based they usually mean learning a world model.

MuZero doesn't actually learn a state transition function- it only models reward. It's a model that predicts reward given an action sequence. I suppose we could consider this a hybrid that's slightly model based because it does have an incentive to learn what's going on. But there's no feedback where we predict future world states and feed that back in.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: