> You can't start from scratch and expect your program to repeat 4 billion years of evolution collecting inductive biases useful in our corner of our Universe in a matter of hours
Really? Minecraft's gameplay dynamic are not particularly complex... The AI here isn't learning highly complex rules about the nuances of human interaction or learning to detect the relatively subtle differences between various four legged creatures based on small differences in body morphology. In these cases I could see how millions of years of evolution is important to at least give us and other animals a head start when entering the world. If the AI had to do something like this to progress in Minecraft then I'd get why learning those complexities would be skipped over.
But in this case a human would quickly understand that holding a button creates a state which tapping a button does not, and therefore would assume this state could be useful to explore further states. Identifying this doesn't seem particularly complex to me. If the argument is that it will take slightly longer for an AI to learn patterns in dependant states then okay, sure, but I think arguing that learning that holding a button creates a new state is such a complex problem that we couldn't possibly expect an AI to learn it from scratch within a short timeframe is a very weak argument. It's just not that complex. To me this suggests that current algorithms are lacking.
It seems easy to you because you can't remember the years when you were a toddler and had to learn basic interactions with the world around you. It seems natural to an adult but it is quite complex.
But this argument applies just as well to tons of other tasks AIs can handle just fine. So it doesn't explain why this particular action is so much harder compared to anything else.
In particular, the task requires understanding that one can impact the world through action. This is learned by humans through a constant feedback loop running for months to a year+. The very way we train AIs doesn't seem to teach this agency, only teach the ability to mimic having that agency in ways that we can capture data for (such as online discussions). Will that training eventually give rise to such agency? I'm doubtful with most current models given that the learning process is so disconnected from the execution and that execution is prompted and not inherently on going. Maybe some agent swarm that is always running and always training and upgrading its members could achieve that level of agency, which is why I'm not saying it is impossible, but I expect we are going to have to wait for some newer model that is always running and which is training as it is running to see true agency develop.
Until then, it is a question of if we can capture the appearance of agency in the training set well enough for learn it with training and not depend upon interactions to learn more.
I don't think I am, and for context here I have built my own DQNs from scratch to learn to play games like Snake.
I'd argue if you consider the size of the input and output space here it's not as complex you're implying.
To refer back to my example, to tell the difference between four legged creatures is complicated because there's a huge number of possible outputs and the visual input space is both large and complex. Learning how to detect patterns in raw image data is complicated and is why we and other animals are preloaded with the neurological structures to do this. It's also why we often use pretrained models when training models to label new outputs – simply learning how detect simple patterns in visual data is difficult enough so if this step can be skipped it often makes sense to skip it.
In constrast the inputs to Minecraft are relatively very simple – you have a handful of buttons which can be pressed and those buttons can be pressed for different durations. Similarly the output space here while large is relatively simple and presumably simply detecting that an action like holding a button results in a state change shouldn't be that complex to learn... I mean it's already learning that pressing a button results in a state change so I think you'd need to explain to me why adding a tiny bit of additional complexity here is so unreasonable. Maybe I'm missing something.
> I think you'd need to explain to me why adding a tiny bit of additional complexity here is so unreasonable
As far as I understand DreamerV3 doesn't employ intrinsic rewards (like in novelty-based exploration). It adopts stochastic exploration which makes it practically impossible to get to rewards that require to consistently repeat an action with no intermediate rewards.
And finding intrinsic rewards that work good across diverse domains is a complex problem in itself.
Example: When humans play Minecraft, they already know object permanence from the real world. I did not see anywhere that AI got trained to learn object permanence. Yet it is required for basics like searching for your mineshaft after turning around.
> Minecraft's gameplay dynamic are not particularly complex...
I think you underestimate complexity of going from 12288+400 changing numbers to a concept of gameplay dynamics in the first place. Or in other words your complexity prior is biased by experience.
Really? Minecraft's gameplay dynamic are not particularly complex... The AI here isn't learning highly complex rules about the nuances of human interaction or learning to detect the relatively subtle differences between various four legged creatures based on small differences in body morphology. In these cases I could see how millions of years of evolution is important to at least give us and other animals a head start when entering the world. If the AI had to do something like this to progress in Minecraft then I'd get why learning those complexities would be skipped over.
But in this case a human would quickly understand that holding a button creates a state which tapping a button does not, and therefore would assume this state could be useful to explore further states. Identifying this doesn't seem particularly complex to me. If the argument is that it will take slightly longer for an AI to learn patterns in dependant states then okay, sure, but I think arguing that learning that holding a button creates a new state is such a complex problem that we couldn't possibly expect an AI to learn it from scratch within a short timeframe is a very weak argument. It's just not that complex. To me this suggests that current algorithms are lacking.