> Isn’t hallucination just the result of speaking out loud the first possible answer to the question you’ve been asked?
No.
> In fact, if you observe your thinking…
There is no reason to believe that LLMs should be compared to human minds other than our bad and irrational tendency towards anthropomorphizing everything.
> So, to evaluate the intelligence of an LLM based on its first “gut reaction” to a prompt is probably misguided.
LLMs do not have guts and do not experience time. They are not some nervous kid randomly filling in a scantron before the clock runs out. They are the product of software developers abandoning the half-century+ long tradition of making computers output correct answers and chasing vibes instead
None of us experience time. Time is a way to describe cause and effect, and change. LLMs have a time when they have been invoked with a prompt, and a time when they have generated output based on that prompt. LLMs don't experience anything, they're computer programs, but we certainly experience LLMs taking time. When we run multiple stages and techniques, each depending on the output of a previous stage, those are time.
So when somebody says "gut reaction" they're trying to get you to compare the straight probabilistic generation of text to your instinctive reaction to something. They're asking you to use introspection and ask yourself if you review that first instinctive reaction i.e. have another stage afterwards that relies on the result of the instinctive reaction. If you do, then asking for LLMs to do well in one pass, rather than using the first pass to guide the next passes, is asking for superhuman performance.
I feel like this is too obvious to be explaining. Anthropomorphizing things is worth bitching about, but anthropomorphizing human languages and human language output is necessary and not wrong. You don't have to think computer programs have souls to believe that running algorithms over human languages to produce free output that is comprehensible and convincing to humans requires comparisons to humans. Otherwise, you might as well be lossy compressing music without referring to ears, or video without referring to eyes.
Yep. The analogy is bad even with that punctuation.
> None of us experience time.
That is not true and would only be worthy of discussion if we had agreed that comparing human experience to LLMs predicting tokens was worthwhile (which I emphatically have not done)
> You don't have to think computer programs have souls to believe that running algorithms over human languages to produce free output that is comprehensible and convincing to humans requires comparisons to humans.
This is true. You also don’t have to think that comparing this software to humans is required. That’s a belief that a person can hold, but holding it strongly does not make it an immutable truth.
No.
> In fact, if you observe your thinking…
There is no reason to believe that LLMs should be compared to human minds other than our bad and irrational tendency towards anthropomorphizing everything.
> So, to evaluate the intelligence of an LLM based on its first “gut reaction” to a prompt is probably misguided.
LLMs do not have guts and do not experience time. They are not some nervous kid randomly filling in a scantron before the clock runs out. They are the product of software developers abandoning the half-century+ long tradition of making computers output correct answers and chasing vibes instead