The Myth of the Turing Test

Over 60 years ago, Alan Turing ("a brilliant mathematician") published a paper in which he suggested a practical alternative to the question "Can machines think?". His alternative took the form of a parlour game, in which a judge has a text-based conversation with both a computer and a human, and the judge has to guess which is which. He called this "The imitation game", and it was ever since misinterpreted as a scientific test of intelligence, redubbed "The Turing Test".

A little less conversation, a little more action please
It might surprise you that the question so often attributed to Alan Turing, "Can machines think?", was not his, but a public question that he criticized:

I propose to consider the question, "Can machines think?" - If the meaning of the words "machine" and "think" are to be found by examining how they are commonly used, - the answer to the question is to be sought in a statistical survey. But this is absurd. Instead of attempting such a definition I shall replace the question by another.
-
"Are there imaginable digital computers which would do well in the imitation game?"
-
The original question, "Can machines think?" I believe to be too meaningless to deserve discussion.

Turing's motivation was apparent throughout the paper: The question had been the subject of endless theoretical discussion and nay-saying (This is still the case today). As this did not help the field advance, he suggested that we should turn the discussion to something more practical. He used the concept of his imitation game as a guideline to counter stubborn arguments against machine intelligence, and urged his colleagues not to let those objections hold them back.

I do not know what the right answer is, but I think both approaches should be tried.
We can only see a short distance ahead, but we can see plenty there that needs to be done.

A test of unintelligence
Perhaps the most insightful part of the paper are the sample questions that Turing suggested. They were chosen deliberately to represent skills that were at the time considered to require intelligence: Math, poetry and chess. It wasn't until the victory of chess computer Deep Blue in 1997 that chess was scrapped as an intelligent feat. If this were a test to demonstrate and prove the computer's intelligence, why then are the answers below wrong?

Q: Please write me a sonnet on the subject of the Forth Bridge.
A : Count me out on this one. I never could write poetry.
Q: Add 34957 to 70764.
A: (Pause about 30 seconds and then give as answer) 105621.
Q: Do you play chess?
A: Yes.
Q: I have K at my K1, and no other pieces. You have only K at K6 and R at R1. It is your move. What do you play?
A: (After a pause of 15 seconds) R-R8 mate.

To the poetry question, the imaginary computer might as well have written a sonnet and so proven itself intelligent (A sonnet is a 14-line rhyme with a very specific scheme). Instead it dodges the question, proving nothing.
The math outcome should be 105721, not 105621. Turing later highlights this as a counterargument to "Machines can not make mistakes", which is the awkward yet common argument that machines only follow preprogrammed instructions without consideration.

The machine (programmed for playing the game) would not attempt to give the right answers to the arithmetic problems. It would deliberately introduce mistakes in a manner calculated to confuse the interrogator.

The chess answer is not wrong though. Given two kings and one knight on a board, the computer moves the knight to the king's row. But a mere child could have given that answer, as it is the only move that makes any sense.

These sample answers pass up every opportunity to appear intelligent. One can argue that the intelligence is ultimately found in pretending to be dumb, but one cannot deny that this conflicts directly with the purpose of a test of intelligence. Rather than prove to match "the intellectual capacities of man" in all aspects, it only proves to fail at them, as most humans would at these questions. Clearly then, the imitation game is not for demonstrating intelligence.

The rules: There are no rules
The first encountered misinterpretation is that the computer should pretend to be a woman specifically, going by Turing's initial outline of the imitation game concept, in which a man has to pretend being a woman:

It is played with three people, a man (A), a woman (B), and an interrogator -
What will happen when a machine takes the part of A in this game?

However I suggest that people who believe this should read beyond the first paragraph. There are countless instances where Turing refers to both the computer's behaviour and its opponent's as that of "a man". Gender has no bearing on the matter since the question is one of intellect.

Is it true that - this computer - can be made to play satisfactorily the part of A in the imitation game, the part of B being taken by a man?

The second misinterpretation is that Turing specified a benchmark for a test:

It will simplify matters for the reader if I explain first my own beliefs in the matter. -
I believe that in about fifty years' time it will be possible, to program computers - to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning.
- I now proceed to consider opinions opposed to my own.

5 minute interrogations and (100%-70%=) 30% chance of misidentifying the computer as a human; Many took these to be the specifications of a test, because they are the only numbers mentioned in the paper. This interpretation was strengthened by the hero-worship that anything a genius says must be a matter of fact.
Others feel that the bar Turing set is too low for a meaningful test and brush his words aside as a "prediction". Yet at the time there was no A.I. to base any predictions on, and Alan Turing did not consider himself a clairvoyant. In a later BBC interview, Turing said it would be "at least 100 years, I should say" before a machine would stand any chance in the game, where earlier he mentioned 50 years. One can hardly accuse these "predictions" of being attempts at accuracy.

Instead of either interpretation, you can clearly read that the 5 minutes and 70/30% chance are labeled as Alan Turing's personal beliefs in possibilities. His opinion, his expectations, his hopes, not rules to a test. He was sick and tired of people saying it couldn't be done, so he was just saying it could.

On the subject of benchmarks, it should also be noted that the computer has at best a 50% chance, i.e. a random chance of winning under normal circumstances: If the computer and the human in comparison both seem perfectly human, the judge still has to flip the proverbial coin at 50/50 odds. That the judge is aware of having to choose is clear from the initial parlour game between man and woman, and likewise between human and computer, or it would beat the purpose of interrogation:

The object of the game for the interrogator is to determine which of the other two is the man and which is the woman.

How well would men do at pretending to be women? Less than 50/50 odds, I should think.

Looks like a test, quacks like a test, but flies like a rock
Not only are the rules for passing completely left up to interpretation, but also the manner in which the game is to be played. Considering that Turing was a man of exact science and that his other arguments in the paper were extremely elaborate, would he define a scientific test so vaguely? We find the answer in the fact that Turing mainly refers to his proposal as a "game" and "experiment", but rarely as a "test". He makes no mention of "passing" and even explains that it is not the point to try it out:

it may be asked, "Why not try the experiment straight away? -" The short answer is that we are not asking whether the computers at present available would do well, but whether there are imaginable computers which would do well.

The pointlessness proved in practice: Yes, several chatbots have passed various interpretations of the game, most notably Eugene Goostman in 2014, and even Cleverbot passed one based on audience vote. But did an intelligent program ever pass? No. Although nobody can agree on what intelligence is, everybody including the creators do agree that those that passed weren't intelligent; They worked mainly through keyword-triggered responses.

Winning isn't everything
Although Turing did seem to imagine the game as a battle of wits, ultimately its judging criteria is not how "intelligent" an A.I. is, but how "human" it seems. In reality, humans are much more characterised by their flaws, emotions and irrational behaviour than by their intelligence in conversation, and so a highly intelligent rational A.I. would ironically not do well at this game.

In the end, Turing Tests are behaviouristic assumptions, drawing conclusions from appearances like doctors in medieval times. By the same logic one might conclude that a computer has the flu because it has a high temperature and is making coughing sounds. Obviously this isn't a satisfying analysis. We could continue to guess whether computers are intelligent due the fact that they can do math, play chess or have conversations, or we could do what everybody does anyway once a computer passes a test: Ask "How does it work?", then decide for ourselves how intelligent we find that process. No question could be more scientific or more insightful.

So, where does that leave "The Turing Test" when it was never an adequate test of intelligence, nor meant to be? Personally I think Turing Tests are still suitable to demonstrate the progression of conversational skills, a challenge becoming more important with the rise of social robots. And it is important that the public stay informed to settle increasing unrest about artificial intelligence. Other than that, I think it is time to lay the interpretations to rest and continue building A.I. that Alan Turing could only dream of.
In ending, more than any technical detail, I ask you to consider Turing's hopes:

Nevertheless I believe that at the end of the century the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted.

No comments:

Post a Comment