Too many googleable questions (like who was the president). Too little 'understanding' type questions like 'Why don't animals have three legs?'
In addition to nonsense questions, I think it would be pretty easy to knock it over with some deeper questions about things they were already talking about. Like asking 'Why don't chairs with 3 legs fall over then?'
To be fair (as one who hasn't paid too much attention to these developements in the last few months/years), the sheer amount of individual facts that are apparently encoded in a fixed nn architecture is what amazes me the most here?
I get that it has 175B variables, but it's not like you can used them in a structured way to store say a knowledge base...
It's ~650GiB of data. The entire English version of Wikipedia is about 20GiB and the Encyclopædia Britannica is only about 300MiB (measured in ASCII characters) [1].
I only use the sources available to me at the time.
Another value published by Wikipedia is 30GiB [1] as of 2020, which includes punctuation and markup.
I explicitly put the measurement unit as ASCII characters. If you have a better source for your size (remember: ASCII characters for article words only, no markup), feel free to post it.
The test originated from the Imitation Game. Turing envisioned a game in which two people (A & B) are tasked to pretent to be the other (not at the same time, though).
An interviewer is then trying to decide who of them is actually the one they pretend to be by asking questions.
In order to remove any hints from appearance, voice, or handwriting, a typewriter is used for interacting.
For example A could be male and B could be female and A is asked to pretend to be B. The interviewer can then ask questions to decide whether A is male or female (yes, I am fully aware that in 2020 this: https://youtu.be/z2_8cfVpXbo?t=129 could also happen).
Turing then proposed to replace A or B with a computer instead and ask the interviewer to decide which of them is the human.
In this scenario, do you really think the interviewer would bombard the candidates with trivia questions and logic puzzles to find out?
The idea was, that other aspects of human intelligence, like a sense for beauty, poetry, etc. would be used instead to differentiate between the two.
Questions like "Do you enjoy reading Harry Potter?" and depending on the answer you could further ask why or why not the subject likes or dislikes the books.
This would be much more insightful and coming up with such questions doesn't require any particular skill on part of the interviewer.
You can even get tricky and return to the topic after talking about something else to see whether you get a similar response or to test the subject's memory.
That's exactly why I said "it seems very dependent on the skill and intelligence of the human tester."
A test "pass" or "fail" could potentially be the fault of the tester as much as it is a sign of intelligence in the AI. How do you evaluate how capable someone is of administering a worthwhile Turing Test?
Maybe they should be tested beforehand. I propose a system involving a remote teletype machine and human tester...
The test itself doesn't rely on one person, though. It's a statistical measure, so for each "game" you have a number of interviewers/judges who test the system.
If the system has any of them fooled, the test can be considered "passed". AFAIK there's no precise number attached to this, so it's not like ">x% people fooled => pass".
So in essence it's not "the human tester" but "the average human tester" for some arbitrary definition of "average".
It's a really interesting dilemma if you think about it: is a forger good if they're able to fool everyone but experts? Does an AI pass the Turing test if it fools the general public, but every AI expert who knows the system would be able to tell after just a few sentences?
Turing explains that the machine could not give an "appropriately meaningful" answer to whatever is said in it's presence, which the "dullest of men" could do.
So the basic quality he seeks in a machine that could pass the so called turing test is a machine that produces meaning. That meaning comes from any mind that is intact or coherent. So unless we take the leap from making category deciders and inference machines to actually building minds, we won't get anywhere near his definition of a machine that is truly intelligent.
I know people who talk like this, and the user history checks out. Either way, the comment is misinforming: Those quotes are from Descartes, not Turing, whose views as to whether machines could think were opposite.
https://plato.stanford.edu/entries/turing-test
Well, I will dismiss all the nitpicky npc nonsense here and just repeat myself.
Are we trying to build minds? Or is all the effort going into building automated money savers that take away trivial jobs? That is the main point here.
Turing only proposed his game as a hypothetical argument, so he barely specified any rules on how to actually perform such a test in practice. He referred to the judging party either as "the man" or "the average interrogator", he never specified a number. Official Turing test events have had anywhere between four to hundreds of people judging.
In addition to nonsense questions, I think it would be pretty easy to knock it over with some deeper questions about things they were already talking about. Like asking 'Why don't chairs with 3 legs fall over then?'