> What's the relevance of the Turing test? It's been beaten for over half a cent...

godelski · 2025-03-03T20:27:59 1741033679

  ELIZA
  PARRY
  Eugene Goostman
  Mitsuku (Kuki AI)

Or see the Loebner Prize where judges aren't just average people but experts (so harder).

The Turing Test was never about intelligence or thinking, even said so by Turing himself. He made the test because those words are too vague. He specifically wanted to shift the conversation to task completion, since that is actually testable. Great! That's how science works! (Looking at you String Theory...) But science also progresses. We know more about what these things intelligence and thinking are. These still are not testable in a concrete sense, but we have better proxies than the Turing Test now.

The problem is that knowledge percolates through society slowly and with a lag. We've advanced a lot since then. I'm sure you are likely willing to believe that most LLMs these days can pass it. The Turing Test was a great start. You gotta start somewhere. But to think we came up with good tests at the same time we invented electronic computers, should give you surprise. Because it would require us to have been much smarter then than we are now.

eru · 2025-03-04T00:11:50 1741047110

> I would be very interested if you have any sources on anyone beating the Turing test in anything close to Turing's original adversarial formulation.

Eliza never won that adversarial version even against layman.

In what sense did Eliza ever 'win' any Turing test?

> I'm sure you are likely willing to believe that most LLMs these days can pass it.

No, I haven't seen any evidence of that.

To repeat: I am interested in evidence that any non-human can beat the Turing test in the original form given in Turing's paper, where you have the judge (human), and two contestants A and B. One of the contestants is a computer, one is a human. Everyone can see what everyone else is writing, and the human contestant can help the human judge. (But the computer can try to fake that 'helping', too.)

Turing specifically wrote: "The object of the game for the third player (B) is to help the interrogator."

I can believe that Eliza has occasionally fooled some random humans, but I can't believe Eliza managed to fool anyone when a third party was around to point out her limitations. (Especially since Eliza ain't smart enough to retaliate and fabricate some 'obvious computer limitations' to accuse the third party of.)

Most LLMs today still have some weaknesses that are easy to point out, if you let your contestants (both kinds) familiarise themselves with both humans and the LLMs in question at their leisure before the test starts.

Just for fun, I just tried out Kuki AI, and it's not going to fool anyone who actually wants to uncover the AI through adversarial cross-examination.

The chat excerpt giving of 'Eugene Goostman' in https://en.wikipedia.org/wiki/Eugene_Goostman also suggests that it would fall apart immediately in an adversarial setting with the full three participants.

However, I do agree that we have made progress and that today's LLMs could hold up a lot longer in this harsher setting than anything we had before. Especially if you fine-tuned them properly to remove telltale signs like their inability to swear or their constant politeness.

> But to think we came up with good tests at the same time we invented electronic computers, should give you surprise.

I never claimed the Turing test is the best test ever, nor even that it's particularly good. I was saying that it hasn't been beaten in its original form.

For example, the Turing test isn't really a fine grained benchmark that lets you measure and compare model performance two multiple decimal places. Nor was it any good as a guideline for how to improve our approaches.

godelski · 2025-03-04T09:47:09 1741081629

https://en.wikipedia.org/wiki/Loebner_Prize

eru · 2025-03-06T01:39:07 1741225147

Thanks, that seems to be reasonably close to Turing's rules. And predictably: no program ever convinced the judges that it was human and the real human competitor was a computer. No program passed the Turing test.

> In addition, there were two one-time-only prizes that have never been awarded. $25,000 is offered for the first program that judges cannot distinguish from a real human and which can convince judges that the human is the computer program. $100,000 is the reward for the first program that judges cannot distinguish from a real human in a Turing test that includes deciphering and understanding text, visual, and auditory input. The competition was planned to end after the achievement of this prize.