Vision and everyday-physics models are the answer: hallucinations will stop when the models stop thinking in words and start thinking in physical reality.
They had easy access to a large corpus of writing to train on, way larger than any human being trained their own language model on. I can't see where they are going to find a large corpus of physical interaction with reality to train that kind of model.