I my view, the 'exactly' is crucial here. They do implicitly tell the model what to do by encoding it in the reward function:
In Minecraft, the team used a protocol that gave Dreamer a ‘plus one’ reward every time it completed one of 12 progressive steps involved in diamond collection — including creating planks and a furnace, mining iron and forging an iron pickaxe.
This is also why I think the title of the article is slightly misleading.
But they don't learn that way at all, my 7yo learns by watching youtubers. There's a whole network of people teaching each other the game, that's almost more fun than playing it alone.