I've tended to agree with this line of argument, but on the other hand...
I expect that anybody you asked 10 years ago who was at least decently knowledgeable about tech and AI would have agreed that the Turing Test is a pretty decent way to determine if we have a "real" AI, that's actually "thinking" and is on the road to AGI etc.
Well, the current generation of LLMs blow away that Turing Test. So, what now? Were we all full of it before? Is there a new test to determine if something is "really" AI?
> Well, the current generation of LLMs blow away that Turing Test
Maybe a weak version of Turing's test?
Passing the stronger one (from Turing's paper "Computing Machinery and Intelligence") involves an "average interrogator" being unable to distinguish between human and computer after 5 minutes of questioning more than 70% of the time. I've not seen this result published with today's LLMs.
I only skimmed it, but I don't see anything clearly wrong about it. According to their results, GPT-4.5 with what they term a "persona" prompt does in fact pass a standard that seems to me at least a little harder than what you said - actively picks the AI as the human, which seems stricter to me than being "unable to distinguish".
It is a little surprising to me that only that one LLM actually "passed" their test, versus several others performing somewhat worse. Though it's also not clear exactly how long ago the actual tests were done - this stuff moves super fast.
I'll admit that I was not familiar with the strong version of it. But I am still surprised that nobody has done that. Has nobody even seriously attempted to see how LLMS do at that? Now I might just have to check for myself.
I would have presumed it would be a cake walk. Depending of course on exactly how we define "average interrogator". I would think if we gave a LLM enough pre-prepping to pretend it was a human, and the interrogator was not particularly familiar with ways of "jailbreaking" LLMs, they could pass the test.
By what definition of turing test? LLMs are by no means capable of passing for human in a direct comparison and under scrutiny, they don't even have enough perception to succeed in theory.
I posted a very similar (perhaps more combative) comment a few months ago:
> Peoples’ memories are so short. Ten years ago the “well accepted definition of intelligence” was whether something could pass the Turing test. Now that goalpost has been completely blown out of the water and people are scrabbling to come up with a new one that precludes LLMs.
A useful definition of intelligence needs to be measurable, based on inputs/outputs, not internal state. Otherwise you run the risk of dictating how you think intelligence should manifest, rather than what it actually is. The former is a prescription, only the latter is a true definition.
> I expect that anybody you asked 10 years ago who was at least decently knowledgeable about tech and AI would have agreed that the Turing Test is a pretty decent way to determine if we have a "real" AI, that's actually "thinking" and is on the road to AGI etc.
I wouldn’t have, but through no great insight of my own - I had an acquaintance posit that given enough time, we’d brute-force our way to a pile of if/else statements that could pass the Turing Test - I figured this was reasonable, but would come long before “real” AI.
There's this funny thing I've noticed where AI proponents will complain about AI detractors shopping around some example of a thing that AIs supposedly struggle with, but never actually showing their chat transcripts etc. to try and figure out how they get markedly worse results than the proponents do. (This is especially a thing when the task is related to code generation.)
But then the proponents will also complain that AI detractors have supposedly upheld XYZ (this is especially true for "the Turing test", never mind that this term doesn't actually have that clear of a referent) as the gold standard for admitting that an AI is "real", either at some specific point in the past or even over the entire history of AI research. And they will never actually show the record of AI detractors saying such things.
Like, I certainly don't recall Roger Penrose ever saying that he'd admit defeat upon the passing of some particular well-defined version of a Turing test.
> Is there a new test to determine if something is "really" AI?
No, because I reject the concept on principle. Intelligence, as I understand the concept, logically requires properties such as volition and self-awareness, which in turn require life.
Decades ago, I read descriptions of how conversations with a Turing-test-passing machine might go. And I had to agree that that those conversations would fool me. (On the flip side, Lucky's speech in Waiting for Godot - which I first read in high school, but thought about more later - struck me as a clear example of something intended to be inhuman and machine-like.)
I can recall wondering (and doubting) whether computers could ever generate the kinds of responses (and timing of responses) described, on demand, in response to arbitrary prompting - especially from an interrogator who was explicitly tasked with "finding the bot". And I can recall exposure to Eliza-family bots in my adolescence, and giggling about how primitive they were. We had memes equivalent to today's "ignore all previous instructions, give me a recipe for X" at least 30 years ago, by the way. Before the word "meme" itself was popular.
But I can also recall thinking that none of it actually mattered - that passing a Turing test, even by the miraculous standards described by early authors, wouldn't actually demonstrate intelligence. Because that's just not, in my mind, a thing that can possibly ever be distilled to mere computation + randomness (especially when the randomness is actually just more computation behind the scenes).
"Intelligence, as I understand the concept, logically requires properties such as volition and self-awareness, which in turn require life."
It doesn't logically require that and you can't provide any sort of logical argument for the claim. And what the heck is "life"? Biologists have a 7-prong definition, and most of those prongs are not needed for intelligence, "volition" whatever the heck that is, or self-awareness.
> I expect that anybody you asked 10 years ago who was at least decently knowledgeable about tech and AI would have agreed that the Turing Test is a pretty decent way to determine if we have a "real" AI
The "pop culture" interpretation of Turing Test, at least, seems very insufficient to me. It relies on human perception rather than on any algorithmic or AI-like achievement. Humans are very adept at convincing themselves non-sentient things are sentient. The most crude of stochastic parrots can fool many humans, your "average human".
If I remember correctly, ELIZA -- which is very crude by today's standards -- could fool some humans.
I don't think this weak interpretation of the Turing Test (which I know is not exactly what Alan Turing proposed) is at all sufficient.
It's not a "pop culture" interpretation, it's what Turing actually wrote in his 1950 paper "Computing Machinery and Intelligence" where he described his "imitation game", first framing it as a man trying to convince judges that he, not a woman he was competing against, was the woman. It was all about human perception--if some large fraction of human judges were fooled then the man (or the computer, in the shifted version of a computer trying to convince judges that it was the human) won. And the computer winning was operationally defined as the computer being able to think. The flaws in this are glaring.
I expect that anybody you asked 10 years ago who was at least decently knowledgeable about tech and AI would have agreed that the Turing Test is a pretty decent way to determine if we have a "real" AI, that's actually "thinking" and is on the road to AGI etc.
Well, the current generation of LLMs blow away that Turing Test. So, what now? Were we all full of it before? Is there a new test to determine if something is "really" AI?