>Demis Hassabis, [] said the goal is to create “models that are able to understand the world around us.”
>These statements betray a conceptual error: Large language models do not, cannot, and will not “understand” anything at all.
This seems quite a common error in the criticism of AI. Take a reasonable statement about AI not mentioning LLMs and then say the speaker (nobel prize winning AI expert in this case) doesn't know what they are on about because current LLMs don't do that.
Deepmind already have project Astra, a model but not just language but also visual and probably some other stuff where you can point a phone at something and ask about it and it seems to understand what it is quite well. Example here https://youtu.be/JcDBFAm9PPI?t=40
>Deepmind already have project Astra, a model but not just language but also visual and probably some other stuff where you can point a phone at something and ask about it and it seems to understand what it is quite well.
Operative phrase "seems to understand". If you had some bizarre image unlike anything anyone's ever seen before and showed it to a clever human, the human might manage to figure out what it is after thinking about it for a time. The model could never figure out anything, because it does not think. It's just a gigantic filter that takes known-and-similar images as input, and spits out a description on the other side, quite mindlessly. The language models do the same thing, do they not? They take prompts as inputs, and shit output from their LLM anuses based on those prompts. They're even deterministic if you take the seeds into account.
We'll scale all those up, and they'll produce ever-more-impressive results, but none of these will ever "understand" anything.
> If you had some bizarre image unlike anything anyone's ever seen before and showed it to a clever human, the human might manage to figure out what it is after thinking about it for a time
Out of curiosity, what sort of 'bizarre image' are you imagining here? Like a machine which does something fantastical?
I actually think the quantity of bizarre imagery whose content is unknown to humans is pretty darn low.
I'm not really well-equipped to have the LLMs -> AGI discussion, much smarter people have said much more poignant things. I will say that anecdotally, anything I've been asking LLMs for has likely been solved many times by other humans, and in my day to day life it's unusual I find myself wanting to do things never done before.
>I actually think the quantity of bizarre imagery whose content is unknown to humans is pretty darn low.
Historically, this just hasn't ever been the case. There are images today that wouldn't have merely been outlandish 150 years ago, but absolutely mysterious. A picture of a spiral galaxy perhaps, or electron-microscopy of some microfauna. Humans would have been able to do little more than describe the relative shapes. And thus there are more images that no one will be familiar with for centuries. But if we were to somehow see them early, even without the context of how the image was produced I suspect strongly that clever people might manage to figure out what those images represent. No model could do this.
The quantity of bizarre imagery is finite... each pixel in a raster has a finite number of color values, and there are finite numbers of pixels in a raster image after all. But the number is staggeringly large, even the subset of images that represent real things, even the subset of that which represents things which humans have no concept of. My imagination is too modest to even touch the surface of that, but my cognition is sufficient to surmise that it exists.
>These statements betray a conceptual error: Large language models do not, cannot, and will not “understand” anything at all.
This seems quite a common error in the criticism of AI. Take a reasonable statement about AI not mentioning LLMs and then say the speaker (nobel prize winning AI expert in this case) doesn't know what they are on about because current LLMs don't do that.
Deepmind already have project Astra, a model but not just language but also visual and probably some other stuff where you can point a phone at something and ask about it and it seems to understand what it is quite well. Example here https://youtu.be/JcDBFAm9PPI?t=40