Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"It allows AI to understand its physical environment and also to self-improve over time, without a human having to tell it exactly what to do."


I my view, the 'exactly' is crucial here. They do implicitly tell the model what to do by encoding it in the reward function:

In Minecraft, the team used a protocol that gave Dreamer a ‘plus one’ reward every time it completed one of 12 progressive steps involved in diamond collection — including creating planks and a furnace, mining iron and forging an iron pickaxe.

This is also why I think the title of the article is slightly misleading.


It's kind of fair, humans also get rewarded for those steps when they learn Minecraft


But they don't learn that way at all, my 7yo learns by watching youtubers. There's a whole network of people teaching each other the game, that's almost more fun than playing it alone.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: