Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's interesting that, while the system can learn lots of different games, you still have to give it the reward function special for each different game. This may seem obvious, and not much of a limitation- after all, lots of things that we think of as intelligent to varying degrees (different human beings, as well as members of other species) have wildly different ideas of what constitutes "reward", so that can't be an inherent part of the definition of intelligence.

But you don't have to explicitly tell a human what the reward function for Pac-Man is. Show a human the game and they'll figure it out. Which makes me wonder if, while there is some room for variability in reward functions, there might be some basic underlying reward computation that is inherent in intelligence. I can't find the link just now, but I read an article a few months ago (might've even been here on HN) about a system that demonstrated the appearance of intelligent behavior by trying to minimize entropy and maximize its possible choices for as long as possible within a given world model.



Pac-Man was designed as a game for humans, with a priori knowledge of what kinds of things humans find rewarding. Thus the goal is obvious because it was designed to be similar to other human goals. Eat the food, don't get eaten. For this reason, it's not at all special that humans can determine the goal of the game.


Yeah. Try sticking a more abstract game like Go in front of a random person and see how that works out. Without being taught the rules, a human will have absolutely no idea how to proceed. This would put a human in pretty much the same boat as a computer.


A friend of mine had a meta-game he'd play with his step father. His step father would buy a new game, but not tell him the rules. They'd play this game until he figured it out and consistently trounced his step dad. Then his step dad would buy a new game.


Wow, that's a great idea. Sounds like loads of fun.


Secure the largest amount of territory and capture enemy groups? Seems pretty human :p


Not to Edward Lasker: "The rules of go are so elegant, organic and rigorously logical that if intelligent life forms exist elsewhere in the universe they almost certainly play go."


You got all that from looking at a 19x19 grid?


I guess that the reward systems that humanity has evolved are complicated and numerous. We've got the basics (food, shelter), the more complicated basics (sex with a suitable mate, companionship) and the million other factors - curiosity, intellectual challenge, positive and negative feedback, power, agency etc, etc....

My thoughts are that if they were to take such a direction with this AI, they'd give it the basics and let it evolve and learn its own complicated reward structure. When you're trying to get a monkey to play pac-man, you bribe him with a capful of ribena - he doesn't care about fun intellectual challenges, but sweet liquids motivate the hell out of him.

(This is the state of actual monkey research - ribena is monkey crack)


To start on this you would want a system that got rewarded based on how well it was able to predict aspects of its environment. This would have to go in hand with preferring stimulation, so you would need something like preference for inputs which maximize relative entropy with respect to its thus far learned model.


That article on entropy minimization also claimed that a single equation could be the basis of a wide range of intelligent behaviours.

https://news.ycombinator.com/item?id=5579047


For tasks that do not reward us biologically (i.e. eating, sleeping), we ultimately depend on other people to give us reward, be that money, acceptance, praise, or whatever.


Yeah, this strikes me as similar to - and basically no more intelligent than - Eurisko. Such is the progress we have made on AI in 40 years.


No. AIXI is unique, in the technical sense of that word. It's similar to Eurisko in that there are goals and things. If you get any deeper, the similarity ends.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: