Came here to say this. Their paper reels of wishful thinking and labeling things in terms they prefer it would be. They even note in one place their replacement model has a 50% accuracy which is simply a fancy way to say the model's result is completely by chance, and it could be interpreted one way or another. Like throwing a coin.
In reality all that's happening is drawing samples on the joint probability of the tokens in the context window. That's what the model is designed to do, trained to do - and that's exactly what it does. More precisely that is what the algorithm does, using the model weights, the input ("prompt", tokenized) and the previously generated output, one token at a time. Unless the algorithm is started (by a human, ultimately), nothing happens. Note how entirely different that is to any living being that actually thinks.
All interpretation above and beyond that is speculative and all intelligence found is entirely human.
Yes GPUs process a (one) computational task on a vast array of data in parallel. But it cannot process two independent tasks concurrently (except, perhaps, by reducing compute power for each task).
Actually, compression is an incredibly good way to think about intelligence. If you understand something really well then you can compress it a lot. If you can compress most of human knowledge effectively without much reconstruction error while shrinking it down by 99.5%, then you must have in the process arrived at a coherent and essentially correct world model, which is the basis of effective cognition.
Fwiw there's highly cited papers that literally map AGI to compression. As in they map to the same thing and people write papers on this fact that are widely respected. Basically a prediction engine can be used to make a compression tool and an AI equally.
The tldr; if given inputs and a system that can accurately predict the next sequence you can either compress that data using that prediction (arithmetic coding) or you can take actions based on that prediction to achieve an end goal mapping predictions of new inputs to possible outcomes and then taking the path to a goal (AGI). They boil down to one and the same. So it's weird to have someone state they are not the same when it's widely accepted they absolutely are.
Isn't DNA in itself a machine of dubious abilities? It's only functional because what functions is what survives, imagine the amount of 'unsurvived' because of how shit the code is.
That implies hire-other reasoning. If the model does not do that, which it doesn't, that's quite simply the wrong term.