There are lots of problems where someone has to run experiments to generate data. If the most optimized possible process to perform the experiment is expensive and takes time to generate 1 data point, then all you can do is wait till more data is produced before a solution is found. Think drug discovery.
Hmm... But you don't have to rely on experiments, no? Ilya's original argument was that language is a representation of reality and so even in the noisy data from the internet with zero feedback loops and experiments - sufficient amount of compute could allow LLMs to get the underlying world model to some degree. Wouldn't the same hold true with cameras and robot interactions? Just predict the next frame of reality in the same way you predict the next token of language...
(Actions leading to reactions may or may not be part of the vector we are learning. I mean they should be, but not strictly necessary)
No? What am I missing?
Just astronomical compute required? Or something more fundamental?
Try translating differential equations or musical notations or chemical formulas into english. When we find one language to be useless or inefficient at representing reality we create another language. Language is just a tool we use to think and transfer info from one chimp brain to another.