Hmm... But you don't have to rely on experiments, no? Ilya's original argument was that language is a representation of reality and so even in the noisy data from the internet with zero feedback loops and experiments - sufficient amount of compute could allow LLMs to get the underlying world model to some degree. Wouldn't the same hold true with cameras and robot interactions? Just predict the next frame of reality in the same way you predict the next token of language...
(Actions leading to reactions may or may not be part of the vector we are learning. I mean they should be, but not strictly necessary)
No? What am I missing?
Just astronomical compute required? Or something more fundamental?
Try translating differential equations or musical notations or chemical formulas into english. When we find one language to be useless or inefficient at representing reality we create another language. Language is just a tool we use to think and transfer info from one chimp brain to another.
(Actions leading to reactions may or may not be part of the vector we are learning. I mean they should be, but not strictly necessary)
No? What am I missing? Just astronomical compute required? Or something more fundamental?