Care to elaborate how AZ/MZ are not model based? MuZero learns a transition func...

Straw · 2025-06-19T13:51:46 1750341106

AZ uses a known transition model. I wouldn't call this model-based, because when people say model-based they usually mean learning a world model.

MuZero doesn't actually learn a state transition function- it only models reward. It's a model that predicts reward given an action sequence. I suppose we could consider this a hybrid that's slightly model based because it does have an incentive to learn what's going on. But there's no feedback where we predict future world states and feed that back in.