Every day I see people treat gen AI like a thinking human, Dijkstra's attitudes about anthropomorphizing computers is vindicated even more.
That said, I think the author's use of "bag of words" here is a mistake. Not only does it have a real meaning in a similar area as LLMs, but I don't think the metaphor explains anything. Gen AI tricks laypeople into treating its token inferences as "thinking" because it is trained to replicate the semiotic appearance of doing so. A "bag of words" doesn't sufficiently explain this behavior.
The contra-positive of "All LLMs are not thinking like humans" is "No humans are thinking like LLMs"
And I do not believe we actually understand human thinking well enough to make that assertion.
Indeed, it is my deep suspicion that we will eventually achieve AGI not by totally abandoning today's LLMs for some other paradigm, but rather embedding them in a loop with the right persistence mechanisms.
Given that LLMs are incapable of synthetic a priori knowledge and humans are, I would say that as the tech stands currently, it's reasonable to make both of those statements.
The loop, or more precisely the "search" does the novel part in thinking, the brain is just optimizing this process. Evolution could manage with the simplest model - copying with occasional errors, and in one run it made everyone of us. The moral - if you scale search the model can be dumb.
Let’s not underestimate the scale of the search which led to us though, even though you may be right in principle. In addition to deep time on earth, we may well be just part of a tiny fraction of a universe-wide and mostly fruitless search.
Yea bag of words isn’t helpful at all. I really do think that “superpowered sentence completion” is the best description. Not only is it reasonably accurate it is understandable, everyone has seen autocomplete function, and it’s useful. I don’t know how to “use” a bag of words. I do know how to use sentence completion. It also helps explains why context matters.
Thats the thing, when you use an Ask/answer mechanism, you are just writing a "novel" where User: asks and personal coding assistant: answers. But all the text goes into the autocomplete function and the "toaster" outputs the most probable text according to the function.
Its useful, it's amazing, but as the original text says, thinking of it as "some intelligence with reasoning " makes us use the wrong mental models for it.
It's not just the pretraining, it's the entire scaffolding between the user and the LLM itself that contributes to the illusion. How many people would continue assuming that these chatbots were conscious or intelligent if they had to build their own context manager, memory manager, system prompt, personality prompt, and interface?
I agree 100%. Most people haven't actually interacted directly with an LLM before. Most people's experience with LLMs is ChatGPT, Claude, Grok, or any of the other tools that automatically handle context, memory, personality, temperature, and are deliberately engineered to have the tool communicate like a human. There is a ton of very deterministic programming that happens between you and the LLM itself to create this experience, and much of the time when people are talking about the ineffable intelligence of chatbots, it's because of the illusion created by this scaffolding.
Bag of words is actually the perfect metaphor. The data structure is a bag. The output is a word. The selection strategy is opaquely undefined.
> Gen AI tricks laypeople into treating its token inferences as "thinking" because it is trained to replicate the semiotic appearance of doing so. A "bag of words" doesn't sufficiently explain this behavior.
Something about there being significant overlap between the smartest bears and the dumbest humans. Sorry you[0] were fooled by the magic bag.
[0] in the "not you, the layperson in question" sense
I think it's still a bit of a tortured metaphor. LLMs operate on tokens, not words. And to describe their behavior as pulling the right word out of a bag is so vague that it applies every bit as much to a Naive Bayes model written in Python in 10 minutes as it does to the greatest state of the art LLM.
Yeah. I have a half-cynical/half-serious pet theory that a decent fraction of humanity has a broken theory of mind and thinks everyone has the same thought patterns they do. If it talks like me, it thinks like me.
Whenever the comment section takes a long hit and goes "but what is thinking, really" I get slightly more cynical about it lol
By now, it's pretty clear that LLMs implement abstract thinking - as do humans.
They don't think exactly like humans do - but they sure copy a lot of human thinking, and end up closer to it than just about anything that's not a human.
It isn't clear because they do none of that lol. They don't think.
It can kinda sorta look like thinking if you don't have a critical eye, but it really doesn't take much to break the illusion.
I really don't get this obsessive need to pretend your tools are alive. Y'all know when you watch YouTube that it's a trick and the tiny people on your screen don't live in your computer, right?
And how do you know that exactly? What is the source of that certainty? What makes you fully confident that a system that can write short stories and one-shot Python scripts and catch obscure pop culture references in text isn't "thinking" in any way?
The answer to that is the siren song of "AI effect".
Even admitting "we don't know" requires letting go of the idea that "thinking" must be exclusive to humans. And many are far too weak to do that.
Spoken Query Language? Just like SQL, but for unstructured blobs of text as a database and unstructured language as a query? Also known as Slop Query Language or just Slop Machine for its unpredictable results.
> Spoken Query Language? Just like SQL, but for unstructured blobs of text as a database and unstructured language as a query?
I feel that's more a description of a search engine. Doesn't really give an intuition of why LLMs can do the things they do (beyond retrieval), or where/why they'll fail.
If you want actionable intuition, try "a human with almost zero self-awareness".
"Self-awareness" used in a purely mechanical sense here: having actionable information about itself and its own capabilities.
If you ask an old LLM whether it's able to count the Rs in "strawberry" successfully, it'll say "yes". And then you ask it to do so, and it'll say "2 Rs". It doesn't have the self-awareness to know the practical limits of its knowledge and capabilities. If it did, it would be able to work around the tokenizer and count the Rs successfully.
That's a major pattern in LLM behavior. They have a lot of capabilities and knowledge, but not nearly enough knowledge of how reliable those capabilities are, or meta-knowledge that tells them where the limits of their knowledge lie. So, unreliable reasoning, hallucinations and more.
Agree that's a better intuition, with pretraining pushing the model towards saying "I don't know" in the kinds of situations where people write that as opposed to by introspection of its own confidence.
There appears to be a degree of "introspection of its own confidence" in modern LLMs. They can identify their own hallucinations, at a rate significantly better than chance. So there must be some sort of "do I recall this?" mechanism built into them. Even if it's not exactly a reliable mechanism.
Anthropic has discovered that this is definitely the case for name recognition, and I suspect that names aren't the only things subject to a process like that.
That said, I think the author's use of "bag of words" here is a mistake. Not only does it have a real meaning in a similar area as LLMs, but I don't think the metaphor explains anything. Gen AI tricks laypeople into treating its token inferences as "thinking" because it is trained to replicate the semiotic appearance of doing so. A "bag of words" doesn't sufficiently explain this behavior.