"It allows AI to understand its physical environment and also to self-improve ov...

ks1723 · 2025-04-07T09:46:36 1744019196

I my view, the 'exactly' is crucial here. They do implicitly tell the model what to do by encoding it in the reward function:

In Minecraft, the team used a protocol that gave Dreamer a ‘plus one’ reward every time it completed one of 12 progressive steps involved in diamond collection — including creating planks and a furnace, mining iron and forging an iron pickaxe.

This is also why I think the title of the article is slightly misleading.

wongarsu · 2025-04-07T18:29:44 1744050584

It's kind of fair, humans also get rewarded for those steps when they learn Minecraft

xwolfi · 2025-04-08T02:05:42 1744077942

But they don't learn that way at all, my 7yo learns by watching youtubers. There's a whole network of people teaching each other the game, that's almost more fun than playing it alone.