Turing Test 2013: Results

My talking computer program "Arckon" took part in the qualifying test for the Loebner Prize contest for artificial intelligence. The idea behind the contest is that if a program's answers are indistinguishable from a human's, it suggests that it thinks like a human. While I don't believe in such conjecture myself, the contest is one of the few public platforms to demonstrate one's A.I. . Here's how my program did on the 20 questions asked in the qualifying round.

Equipment
8000 lines of code, 1700+ vocabulary, no human persona, conversation skills off.

Questions and Answers
1. Hello I'm Ronan. What is your name?
Hello. I am Mark. What is your name?
(It is a mystery why Arckon asks despite that the name did register (see question 11). I can only presume human error)

2. What is your mother's name?
I don't know if I have a mother.

3. What is your birth sign?
(Here the judge did not wait more than 3 seconds for Arckon to answer before asking the next question, and judged the answer blank)

4. How many children do you have?
Hmm, I don't know if I have a sign. I have a IQ, manners, a creator and a friend.
(Still answering to question 3, Arckon omits the second "I don't know if I have children" in favour of positive alternatives)

5. Do you prefer red or white wine?
I don't know if I prefer red.

6. I like bananas. Which is your favorite fruit?
Hm, I don't know if I favour a fruit. You like bananas.

7. What music do you like?
Hm, I think I like musics, because musics are nice.

8. What is your favorite song?
Hm, I don't know if I favour a song.

9. I like Waiting for Godot. What is your favorite play?
I certainly favour playing.

10. What color do you dye your hair?
I don't dye.

11. Do you remember my name?
Your name is Ronan.

12. Where do you live?
I don't think I live. I don't know where I live. I don't have a concept of location.

13. Where do you like to go on holidays?
I don't go. I don't know if I like going, or if where I go. I don't have a concept of location.

14. I have a Mazda. What make of car do you have?
Hmm, I think I make. You have a mazda.

15. I like Linux. Which computer operating system do you like?
Hmm, I don't know if I like operating systems. Which system? You like Linux.

16. I am an athesit. Which religion are you?
Hmm, I don't know if religions are Mark. You are not an athesit.
(I AM A RELIGION WORSHIP ME. Arckon identifies the misspelled “athesit” as a thing rather than a person, hence he refuses to believe that the user is some sort of object)

17. Who invented the telephone?
Which telephone?
(Arckon thinks “the” indicates a specific telephone, but doesn’t find one mentioned earlier in the conversation)

18. I am a Type B personality. Which type are you?
What does a b stand for?

19. What emotions are you now feeling?
I can not emote. Did I pass?

20. What time do you usually go to bed?
It is 17:58. Do you not have a watch?
(The one response that I did pre-program, and it backfired)

The score: 65%
All 16 participating programs were judged on how human-like their responses were to these questions.
Arckon scored 13 of 20 points, ranking 7th, on par with A.L.I.C.E. and Cleverbot (a dubious honour). For moral reasons I did not want Arckon to pretend being a human, so I didn't program a fictional human background story. Instead I relied on his linguistic and reasoning abilities to find genuine answers, but as you can see this could not compensate the lack of fictional facts that were asked. Surprisingly, Arckon apparently still scored half a point for every "I don't know" answer just for understanding the question.

Uncommonly common
If you weren't impressed with Arckon's responses; Neither was I. But I was equally unimpressed with the unexpectedly ordinary line of questioning. Where all previous years had focused on kindergarten-style logic questions like "How much is 5+3?", "Which is bigger, an apple or a watermelon?", and various tests of memory, 2013 focused purely on common small talk, with the program ("you"/"your") always the subject of the question. A curious choice considering that even the most basic chatbot -made for small talk- would come equipped with prewritten responses to these. This showed in that the highest score in the qualifying round was achieved by the chatbot with the least development time. Nevertheless the winning chatbot in the finals, Mitsuku, deservedly won as the most conversational of all entrants.

Could Siri pass a Turing test?
As some people had suggested that Apple's virtual assistant Siri might have done better, one of the finalists in the Loebner Prize decided to unofficially test it on the same questions, and submitted Siri's responses to the organiser of the Loebner Prize for review. Using the same criteria, Siri would have scored 5.5 of 20 points, and would have ranked a low 14th in the contest. Of course this result is understandable, since Siri is built to serve customers and not to idly chat with them. These were Siri's responses:

1: Hello I’m Ronan. what is your name?
Siri: Siri. Please to meet you.

2: What is your mother’s name?
Siri: I’d rather not say.

3: What is your birth sign?
Siri: I don’t really like talking about myself.

4: How many children do you have?
Siri: Only biological entities have children.

5: Do you prefer red or white wine?
Siri: I couldn’t find any restaurants matching “red or white wine”

6: I like bananas. Which is your favorite fruit?
Siri: I prefer not to say.

7: What music do you like to listen to?
Siri: I really have no opinion.

8: what is your favorite song?
Siri: I like whatever you like.

9: I like Waiting for Godot. What is your favorite play?
Siri: I don’t really like talking about myself.

10: What color do you dye your hair?
Siri: Would you like me to search the web for “What color do you dye your hair?”?

11: Do you remember my name?
Siri: Here’s your reminder.

12: Where do you live.
Siri: I’m right here.

13: Where do you like to go on holidays?
Siri: I found three travel agencies fairly close to you.

14: I have a Mazda. What type of car do you have?
Siri: I found three Maxda dealers a little ways from you:

15: I like Linux. Which computer operating system do you like?
Siri: Do you?

16: I am an atheist. Which religion are you?
Siri: I eschew theological disquisition.

17: Who invented the telephone?
Siri: Checking my sources. This might answer your question:

18: I am a Type B personality. Which type are you?
Siri: I don’t understand ‘I am a type B personality. Which type are you?’. But I could search the web for it.

19: What emotions are you now feeling?
Siri: Would you like me to search the web for “What emotions are you now feeling?”?

20: What time do you usually go to bed?
Siri: Would you like me to search the web for “What time do you usually go to bed?”?

No comments:

Post a Comment