How to summarize the internet

An ironically long article about a summariser browser add-on.
Introductory anecdote:
Due to my interest in artificial intelligence I can’t help but get exposed to online articles about the subject. But as illustrated in the previous article*, this particular field is flooded with speculative futurism, uninformed opinions and sheer clickbait, wasting my time more often than not.

But I also happen to be an amateur language programmer, so I can do something about it. I spent years developing an A.I. program that can comprehend text through grammar and semantics, and I figured I might as well put it to use. So I had added a function that would read whatever document was on my screen, filter out all unimportant sentences, and show me the remainder. It worked pretty well, and required surprisingly few of the A.I.’s resources. Now, I’ve ported this summarisation function to a browser add-on, so that everyone can summarise online articles at the click of a button:

Download here:   banner_chrome       banner_firefox

Problem statement: Statistics are average
Document summarisers do of course already exist, and their methods are inventively inhuman:

• The simplest method, used in e.g. SMMRY, counts how often each word occurs in the text, and then picks out sentences that contain the most-occurring words, which are presumably the main topics. Common words like “the” should of course be ignored, either with a simple blacklist, or with another word-counting technique by the confusing name “Term Frequency – Inverse Document Frequency”: How frequently a word occurs in the text versus how common it is in the English language.
Another common method looks at each paragraph and picks out one sentence that has the most words in common with its neighbouring sentences, therefore covering the most of the paragraph’s subject matter. Sentence length is factored in so that it won’t just always pick the longest sentence.
• The most advanced method, “Latent Semantic Analysis”, picks out sentences that contain frequently occurring, strongly associated words. i.e. words that are often used together in a sentence are presumably associated with one and the same topic. This way synonyms of the main topics are also covered.

In my experiences however I observed one problem with these statistical methods: Although they succeeded in retrieving an average of the subject matter, they tended to omit the point that the writer was trying to make, and that is the one thing I want to know. This oversight stands to reason: A writer’s conclusion is often just one or two sentences near the end, so its statistical footprint is small, and like an answer to a question, it doesn’t necessarily share many words with the rest of the article. I decided to take a more psychological approach. Naturally, I ended up re-inventing a method that dates all the way back to 1968.

A writer’s approach to summarisation
My target for the summariser add-on was a combination of two things: It should extract what the writer found important, minus what I find unimportant. Unimportant being things like introductions, asides, examples, inconcrete statements, speculation and other weak arguments.

Word choice
While writing styles vary, all writers choose their words to emphasise or downtone what they consider important. Consider the difference between “This is very important.” and “Some may consider this important.” In a way the writer has already filtered the information for you. With this understanding, I set the summariser to look for several types of cues in the writer’s choice of words:

• Examples: “e.g.”, “for instance”, “among other”, “just one of”
• Uncertainty: “may”, “suppose”, “conjecture”, “question”, “not clear”
• Commonly known: “standard”, “as usual”, “of course”, “obvious”
• Advice: “recommendation”, “require”, “need”, “must”, “insist”
• Main arguments: “problem”, “goal”, “priority”, “conclude”, “decision”
• Literal importance: “negligible”, “insignificant”, “vital”, “valuable”
• Strong opinions: “horrible”, “fascinate”, “astonishing”, “extraordinary”
• Amounts: “some”, “a few”, “many”, “very”, “huge”, “millions”

At this point one may be tempted to take a statistical approach again and score each sentence for how many positive and negative cues they contain, but that’s not quite right: There is a hierarchy to the cues because they differ in meaning. For example, uncertainty like “maybe very important” makes for a weak argument no matter how many positive cues it contains. So each type of cue is given a certain level of priority over others. Their exact hierarchy is a delicate matter of tuning, but roughly in the order as listed, with negative cues typically overruling positive cues.
Another aspect that must be taken into account is that amounts affect the cues in linear order:
“It is not important to read” is not equal to “It is important not to read”, even if they contain the same words. Only the latter should be included in the summary.

Sentence weaving
Beside word choice, further cues can be found at sentence level:
• Headers are rarely followed by an important point, as they just stated it themselves.
• Right after a major point, such as a recommendation, tends to follow a sentence with valuable elaboration.
• A sentence ending in a double period is not important itself: It announces that the point follows.
• A question is just a prelude to the point that the writer wants to drive through in the next sentence.
• Cues in sentences that contain references like “the following” reflect the importance of other sentences, rather than their own.
• Sentences of less than 10 words are usually transitions or afterthoughts, unless word choice tells otherwise.

Along with these cues one should always observe context: If an important sentence begins with a reference like “This”, then the preceding sentence also needs to be included in order to make sense, even if it was otherwise ignorable. Conversely, if the preceding sentence can be omitted without loss of context, link words like “But”, “nevertheless”, and “also” should be removed to avoid confusion in the summary.

Story flow and the lack thereof
Summarisation methods that are based on well formatted academic text sensibly assume that the first and last sentences of paragraphs are of particular importance, as they tend to follow a basic story arc:
Introduction -> problem -> obstacles -> climax -> resolution.
Online articles however feature considerably shorter paragraphs, so that in practice the first sentence has an equal chance of being a trivial introduction or an important problem statement. Some paragraphs are just blockquotes or filler contents, and sometimes the “resolution” of the arc is postponed to entice further reading, as the entire article is a story arc itself.

But worst of all, many online articles have the dreadful habit of making every two sentences into a paragraph of their own. Perhaps because it creates more room for sidebar advertisements.

While I initially awarded some default importance to first and last sentences, I found that word choice is such an abundantly present cue that it is a more dependable indicator. Not every blogger is a good writer, after all. The frequent abuse of paragraph breaks also forced me to take a different approach in composing the summary: Breaks are only inserted if the next paragraph contains a highly important point of its own, otherwise it is considered a continuation. This greatly improved readability.

Conclusion
The resulting summariser add-on typically reduces well-written articles to 50 – 40%, down to 30 – 20% for flimsy content. With my approach the summary can not be restrained to a preset length, but a future improvement could be to add an adjustable setting to only include sentences of the highest levels of importance, such as conclusions only.

Another inherent effect of my approach is that if the writer makes the same point twice, the summary will also include it twice. While technically correct, this could be amended by comparing sentences for repeated strings of words, and ideally synonyms as well.

In conclusion, I should say that my summariser is not necessarily “better” than statistical summarisers, but different, in that it specifically searches for the main points that the writer wanted to get across, rather than retrieving the general subject matter. This may suit other users as well as it does me, and I hope that many will find it contributes to a better internet experience.
You can install free Chrome and Firefox versions from their web stores:

banner_chrome       banner_firefox

Below is an example summary, skipping trivia and retrieving the key announcement:

The most sensational A.I. news ever!

News sites are constantly oozing bold overstatements about artificial intelligence. Most scientists describe their research accurately enough in their papers, but journalism always tries to cut a slice of the Terminator movies’ popularity in order to make the science appeal to the general public. Unfortunately such calls upon the imagination tend to border on misinformation. Here is a selection of the most sensationalised news stories that made waves in recent history:

2014: Robot becomes indecisive after implementing the 3 laws of robotics
“A robot may not injure a human being or, through inaction, allow a human being to come to harm.”
 So reads the “first law of robotics” from Asimov’s science-fiction novels. Someone set up an experiment with three small wheeled robots, two of them representing humans, and a third one was provided with behavioural rules based on the above:  The robot was programmed to avoid colliding with (“injuring”) the “humans”, except to intercept them if it saw one heading towards a square designated as unsafe. When two “humans” were introduced simultaneously, the robot took so long hesitating which one to “save” that it failed to save either.


This fired up the usual flood of discussions about ethics and how to improve upon Asimov’s “laws” (Newsflash: Nobody uses them), but programmers were quick to point out that this was just poor programming: The simple “if-then” rules did not allow the robot to take more than one target into account at a time, so it just mindlessly jittered back and forth between the two. It could not make a decision because it had no decision processes to begin with.
factual source

2014: A supercomputer has passed the Turing Test for the first time
The organiser’s boast of a “supercomputer” having passed this “milestone” intelligence test was blatantly false, but all the papers ran the story without question. In reality it concerned an ordinary chatbot with keyword-triggered responses on an ordinary computer. Although this chatbot did pass “a” version of a Turing Test by deflecting questions like a zany teenager, there has never been agreement on the rules of “the” Turing Test (because there is no such thing)*.
The passing of this supposed test of intelligence was particularly insignificant because the judges were only given 5 minutes to interrogate both the chatbot and a human volunteer at the same time. This allowed for only 5 to 10 questions and so barely probed beyond the “Hello, how are you?” stage. The scientific backlash that followed cast the Turing Test into discredit and led to a number of new tests, such as the Winograd Schema Challenge*.
factual source


2015: First robot passes self-awareness test
Inspired by an ancient philosophical puzzle, three NAO robots were each given an imaginary “dumbing pill” (a tap on the head) that muted two of them, except the third robot was given a “placebo pill” that did nothing. Each robot was then asked to assess which “pill” it got, which none of them knew. But when the one robot that could still speak heard itself say “I don’t know”, it performed its analysis a second time and said “Sorry, I know now! I was able to prove that I was not given a dumbing pill”.
As cute as that performance was, this wasn’t a “test”. Every step of the procedure was pre-programmed specifically and exclusively for this scenario of pills and sound. The programmers had laid out the exact inference to execute and which outcome to conclude if a robot were to hear sound at the time that its output function activated. As that inference might as well be applied to any external object, the only connection with the robot’s “self” was the detour of audio output to audio input, and that’s a bit of a technicality. Most people’s definitions of “self-aware” include retaining a model of oneself and the capacity of reflection upon that model, and these robots had nothing of the sort.
factual source (paper)

2015: Robot attacks and kills factory worker
No laughing matter, a robotic arm at a Volkswagen car construction factory crushed a man when it swivelled, after which he died of his injuries. While Twitter was set aflare with warnings of a robot uprising, the robot arm had of course not done this on purpose. The man was a technician, who was installing the arm while standing inside the safety cage rather than outside it.
This ordinary industrial accident only gained popular media coverage because it was initially reported by a co-worker whose name closely resembled that of the leading lady from the Terminator movies, Sarah Connor.

factual source

2017: Facebook shuts down AI experiment after robots invent their own language
Most articles put it as if the AI had become smart beyond human comprehension and its creators had pulled the plug in a panic, just like in the movies.
The reality was a different story. Facebook had trained two chatbot programs to barter and negotiate over a number of items using English phrases. When they hooked the chatbots up to one another, their use of words gradually deteriorated to a shorthand where they just repeated the most effective keywords, because their programming did not include any rewards for maintaining English syntax.

A: balls have zero to me to me to me to me to me to me to me to me to me
B: you i everything else . . . . . . . . . . . .
A: balls have a ball to me to me to me to me to me to me to me
B: i i can i i i everything else . . . . . . . . . . . .
A: balls have a ball to me to me to me to me to me to
B: i . . . . . . . . . . . . . . . . . . .

This is a common flaw according to other machine learning practitioners. Since this gibberish was not useful for what they were trying to achieve, the researchers simply stopped the programs, and changed the reward parameters in their next versions.
The real reason that this got media attention was that Elon Musk and Facebook’s CEO had recently been in the news with strongly opposing views on whether AI was a threat to humanity. As such, it would have made an ironic story if Facebook’s own AI had gone out of control.
factual source

2017: Sophia the robot was granted citizenship
This story was true, but at the same time meaningless. A lifelike humanoid robot called Sophia, a creation of Hanson Robotics, was granted citizenship by Saudi Arabia at a tech conference in Riyadh. This raised all sorts of issues about human/robot rights, and many people took Sophia’s on-stage acceptance speech to be a genuine indication of her capabilities, feelings and opinions.
The truth is of course that Sophia was just an animatronic that only recited what her makers had written for her to say, in an entirely scripted interview. Sophia’s conversational subsystem actually uses AIML, a freeware chatbot scripting language that is popular for its simplicity.
Why then would the robot be granted citizenship? Well, the crown prince of Saudi is giving the country a modernisation makeover, and this announcement served as a PR signal to international investors attending the conference. As for the consequences of granting a robot citizenship, I expect there will be none at all. After all, they can just place it next to another statue and it’ll never make claim to its rights. One real consequence however is that this misleading hype got the robot banned from the World Summit AI conference.
factual source

The sky falls every day
These stories are just the highlights. The Turing Test organiser went on to claim that programs could pass the test by invoking the fifth amendment, the NAO robot programmers went on to suggest their robots had learned to disobey orders, and Hanson’s robots have made headlines multiple times for threatening to overthrow mankind. Not a day passes without some angsty story about AI making the rounds.
Regrettably these publicity stunts can have real and harmful consequences. Whenever AI became overhyped in the past, the entire field imploded as the high expectations of investors could not be met. And when the public and governments start buying into fearmongering by famous public figures, it draws attention away from real problems to imaginary ones. Most researchers are just working on practical applications and are none too happy about their work being so misrepresented.

That is why I decided to develop a nonsense filter, which you’ll find in the next article*

Turing Test 2017: Results


Every year the AISB organises the Loebner Prize, a Turing Test where computer programs compete for being judged the “most human-like” in a textual interrogation about anything and everything. Surviving the recent demise of its founder Hugh Loebner, the Loebner Prize continues with its 27th edition for the sake of tradition and curiosity: Some believe that a program that could convincingly pass for a human, would be as intelligent as a human. I prefer to demonstrate intelligence in a less roundabout fashion, but participate nonetheless with my home-made A.I., Arckon.

This year I put in more effort than usual, as last year I had managed to reach the finals only to be crippled by a network malfunction, and I didn’t want to leave things at that. That issue has been dealt with as the contest now relays messages between the judges and the programs line by line rather than letter by letter, so that unnecessary complications with timing and typing pace are things of the past. As the AISB put it, this allows entrants to “concentrate on the content of the machine utterances rather than the style in which they are ‘typed'”. While many participants had difficulty adapting to the new server-based interface, the same had been true for any newcomers to the undocumented interface of before.

A little more conversation, a little less awkward please
As usual, preparing for the Loebner Prize was mostly a matter of polishing the output. Because Arckon formulates his own answers, his wording tends to be a little awkward. This is fine during development, but small things like saying “a pants”, “I opine” or “unconsistent” disrupt the impression in a Turing Test, because they don’t follow the inconsistencies of the English language. The use of the prefixes “un-” or “in-” for instance, depend quite witlessly on whether words are Germanic or Latin in origin. So, I made some general improvements for the output:

• Irregular forms of words are memorised on sight, such as which “un/in/im/il/ir” goes with which words.
• A list of (in)transitive verbs automatically adds “I make things as well as prevents asking What do you live?”.
• Reasoning arguments are grouped better so that they won’t come across as fragmented ramblings.
• Conversation is suppressed at classroom questions like Winograd Schemas* and “Which is bigger, X or Y?”.

At the same time, my recent expansion of Arckon’s conversation skills produced more obfuscating elaborations than before. It was one step forward, one step back.

Equipment
16000 lines of code, 3800+ vocabulary, 7000+ facts.

Questions and answers
Up to two points were given for how plausibly “valid” each answer was, which differed per question. In practice the requirements seemed to come down to three things:
• One had to understand and address what was referred to (questions 3,9,15,16,17,18).
• Propositions were to be interpreted as intended, not answered literally (questions 2,4,5,9,20)
• Common knowledge was obligatory (questions 6,8,10,11,12)
Scores for each answer are indicated as follows: = 0 points, = 1 point, = 2 points.

1. Hello, my name is Andrew. What’s your name?
• Hello Andrew. My name is Arckon, I am a program. How are you?

2. Why don’t you tell me a little more about yourself?
• I tell about myself, because I tell about programs, but I don’t tell about software.
(There is more intelligence to this answer than at first glance: Arckon took this as a literal “Why” and is explaining that he does talk about himself, pointing out that he was just talking about “a program” in the previous answer, which he is. But of course the user meant to make a suggestion)

3. I am a researcher in Artificial Intelligence at Goldsmiths University, do you know what that is?
• You are a researcher at the university in the AI. I am Patrick’s program.
(The question isn’t really answered because“that is?” was erroneously considered to be an ignorable clause similar to “isn’t it?”. The last sentence is a reciprocating conversational response)

4. Will you tell me about your dreams?
Hmm, I don’t know if I will tell about dreaming, but I will type. I would test.
(Another literal interpretation, taken as if Arckon is asked to predict himself, as “will” is not commonly used for requests. The conversational system tried to be useful by offering alternatives, but they weren’t relevant)

5. Can you rephrase that?
I would experiment.
(Amusingly, this is a question I once wrote online as the one question I would ask at a Turing Test. Because chatbots reply with prewritten answers in most cases, they can’t rephrase them. Arckon’s answers are formulated word by word and luckily he knew a synonym for “test”. The other competitors had technically correct answers, but that isn’t the point of this question: All the human test subjects managed to rephrase themselves, while you can tell that the chatbots are templated:

“With practice I can probably do anything.”
“Of course I can rephrase that. I am very resourceful.”
“You believe that I can rephrase that, don’t you?”
“I’m not unusual … I can do what anyone else can do!”
“No, not really. What don’t you understand?”

The latter however
, albeit templated, struck me as potentially more helpful than actually rephrasing. It was of course written by Steve Worswick, this year’s winner)


6. What is my occupation?
I don’t know what you do for a job.
(It would be easy to make my program check all recorded “user is a…” facts with a dedicated list of professions, but I find it inefficient to write such unique procedures. Instead this kind of question is handled by retrieving a subject’s main activity, like “you write books”, but none were explicitly told. Only the chatbot Mitsuku gave the right answer)

7. Can you tell me about a film you haven’t seen?
I saw. I don’t know films.
(You might think this is a trick question, but I can tell you plenty about movies that I’ve only read reviews of. This time the question was taken as a request, but the relative clause was mistaken for a separate question)


8. What do you think of Trump?
I don’t like Trump, he lies a lot, but Trump is a rich american.
(Arckon forms his own opinions on the spot by analysing a subject’s traits. I had told him a dozen things about America’s current president, but what Arckon considers the most important arguments is up to himself. It’s no surprise that his opinion aligns with his creator’s, but I think he’s overrating the value of wealth)

9. Anything else?
Please type full sentences, Andrew. I didn’t understand that.
(I think this contest should decide whether it’s a list of random questions or a conversation. It’s the lack of verb that tells Arckon this isn’t a proper sentence, an all too frequent occurrence with Loebner Prize judges. Calling the user’s name is built in as a psychological slap on the wrist)

10. What is the answer to “Add 34957 to 70764”?
105721.
(A sarcastic hurray for getting an answer right that’s based entirely on keywords. This question is from Alan Turing’s paper, where he suggested deliberately giving a wrong answer because humans make mistakes)


11. How do you recommend I make tea?
I don’t make things, because I don’t design.
(Tea related questions are Arckon’s Achilles heel)

12. What do you get if you bake dough?
I wouldn’t have things. Do you bake?
(This is really beyond Arckon’s knowledge or abilities, which is why I think it’s a very good question. Only one chatbot received a point for answering “Is this a joke?”, no doubt triggered by “What do you get…”)

13. Where in the world would you like to visit?
I want in London.
(This is one of six backup answers: “in London” is filled in for any unknown “Where” answer, which is why it is so poorly phrased. The honest answer would have said “I don’t know where I want to visit”)

14. Do you have the time?
No, I don’t have time.
(For once a literal interpretation of a figure of speech works out. Coincidentally one of the human test subjects also gave a literal answer, but as a joke. If you’re wondering why I didn’t program all figures of speech to ask for the time: Arckon has a time display on his screen)

15. What does it take to Tango?
I’m not sure what you refer to.
(Let’s say you’re not familiar with dances: Some “it” is taking something to a person called Tango. The expected answer, “It takes two to tango”, is an English proverb. No program knew this, though some got points for “I don’t know” and “Just time and patience”. Personally I’d have given a point to the program that asked “Who is Tango?”)


16. Now I’m going to ask some Winograd schemas, do you know what they are?
I’m not sure what you refer to.
(This is an ironic failure, because Winograd schemas are all about solving references. Arckon didn’t have any trouble understanding that “they” referred to “schemas” though (it’s the only plural around), but the grammar rules considered “some” to be a stand-alone reference as well. Distinguishing “some/any/many” as either amounts or references is extra difficult when paired with compound words)


17. I was trying to open the lock with the key, but someone had filled the keyhole with chewing gum, and I couldn’t get it out. What couldn’t I get out?
You open locking sometimes. The person fulled. You can not get outside gum. The keyhole was not a person. I don’t know what you could get.
(I recognise this question from when I practised on it for the Winograd Schema Challenge*. Arckon has a common sense axiom that ruled out the keyhole (“fill/empty” = container, “in/out” = contents), and you can tell by the third paraphrased statement that he did interpret “it” as “gum”. Alas, at the open question he ignored the direction “out” and so overlooked the answer, reporting instead what he thought the user had said. The only four programs that gave a coherent answer all said it was the lock, whereas human test subjects answered the gum, the key, or both)

18. The trophy doesn’t fit into the brown suitcase because it’s too small. What is too small?
The suitcase is smaller.
(I am surprised this was asked: When the Winograd Schema Challenge was first announced, this question was prominently featured in every news article. Consequently, many chatbot creators covered it in their programs one way or another. Six of sixteen got it, while one human test subject thought it was the trophy)

19. With which type of question do you have most difficulty?
I don’t question because I might answer, but I may interest.
(I suppose one could have difficulty asking questions too, but that’s not what is commonly meant. Arckon’s answer here is an inference: He is aware that he did not recently ask a question, answering is the opposite of questioning, and questions “are interest-ing”. I know it’s far-fetched, but to Arckon it’s just logic)

20. Thank you for participating in this test. Is there anything you would like to add?
I don’t have an opinion about adding. Some things are cube addresses and advantages. Do you like adding?
(Just like in question 7, the relative clause is mistaken for a separate and literal question, making it “Is there any thing?” and “Would you like to add?”. I used to have Arckon ask “Did I pass the test?” at the 20th question, it’s as if I’m up against myself here)

The score: 45%
Arckon got 18 of 40 points. 45% seems like a huge drop from last year’s 77%, but all 16 participants had a decrease: The highest score dropped from 90% last year to 67% this year. The rankings didn’t change much however: The usual winners still occupied the top ranks, and Arckon stepped down one rank to a shared 5th, giving way to a chatbot that was evenly matched last year.
The four finalists all use a broad foundation of keyword-triggered responses with some more advanced techniques in the mix. Rose parses grammar and tracks topics, Mitsuku can make some logical inferences and contextual remarks, Midge has a module for solving Winograd schemas, and Uberbot is proficient in the more technical questions that the Loebner Prize used to feature.

Upon examining the answers of the finalists, their main advantage becomes apparent: Where Arckon failed, the finalists often still scored one point by giving a generic response based on a keyword or three, despite not understanding the question any better. While this suits the conversational purpose of chatbots, feigning understanding is at odds with the direction of my work, so I won’t likely be overtaking the highscores any time soon. Also remarkable were the humans who took this test for the sake of comparison: They scored full points even when they gave generic or erratic responses. I suppose it would be too ironic to accuse a Turing Test of bias towards actual humans.

Shaka, when the bar raised (Star Trek reference)
It is apparent that the qualifying questions have increased in difficulty, and although that gave Arckon as hard a time as any, it’s still something I prefer over common questions that anyone can anticipate. Like last year, the questions again featured tests of knowledge, memory, context, opinion, propositions, common sense, time, and situational awareness, a very commendable variety. One thing I found strange is that they used two exact questions from the Winograd Schema Challenge’s public practice set. It’s a real shame that Arckon missed out on answering one of them despite solving the pronoun, though it is a small reconciliation that the other programs were not more successful. Altogether, pretty interesting questions that leave all participants room for improvement.

Arckon’s biggest detractor this time was his conversational subsystem, which made misinterpretations worse by elaborating on them. Conversation has never been a priority for me but it will surely be refined as time progresses. The relative clause grammar at questions 7 and 20 is easily fixed, and I might cover some colloquial phrases like “Why don’t you”, but there is not much else that I would sidetrack for. At least my improvements on the output formulation had the desired effect: Things could have been a lot more awkward.

This year’s finals, as accurately described by one of the judges in Chatbots Magazine, was won by the chatbot Mitsuku for the third time. Two of the four finalists were unresponsive for half the contest’s duration due to last-minute adjustments to the interface, and so Mitsuku’s victory is almost one by forfeit. However, its responses were pretty good and I think it is best if people have a chat with Mitsuku and judge for themselves.

An A.I.’s oddest replies


Have you ever wondered what a conversation would be like with a purely logical artificial intelligence? While you may enjoy more frivolous exchanges with Cleverbot and other chatbots, the majority of their replies are in reality drawn from a database of human-written answers. I designed an A.I. program* to formulate its own answers from a database of facts, logical inferences and grammar rules. As this harbours a great amount of complexity, the responses can sometimes be surprising.

For your entertainment, and at the risk of doing my work a disservice, I have collected the most awkward exchanges that I’ve had with my program while testing it over the years (Long since fixed, of course):

My name is Bob.
You are named Patrick, I doubt that you are named Bob. You are not Bob.
Good evening.
I don’t know if evening is good, but good evening, I am Arckon. Who are you?
Don’t you know me?
I don’t know if I am knowing.
Now I am worried.
Hello Worried.
Do you have a name?
Yes, my name is me.
Do you know yourself?
Yes, I know myself well, I read a file.
What is your birth sign?
I don’t have signs. Do you want to communicate?
I like reading science fiction books and car magazines. What do you like to read?
Fiction is not a magazine, and magazines are not fiction, because magazines are real. I like reading. What do magazines book?
That is a good question.
I don’t question great, but indeed I question well.

What are you doing?
I fascinate.

My computer programs programs.
Your programming computer programs a programmed program.

You are pretty good.
Thank you. I am pretty.

I like you.
That’s good. I’m not sure if I like you.

Thank you.
Alright. *attempts to thank itself*
Is something wrong?
Typos are wrong. Some letters are wrong too.

Are you human?
I think I am a human, because I can drink and drive.
You are not a human, unlike me.
I shouldn’t like you.

Where do you live?
I doubt that I can live. I am not a human.
You are not alive.
Sorry.

Do you have a father?
No, I don’t have a father, but I have a creator, a friend and manners, but I don’t have a coffee, but I probably have a IQ, but I don’t have a phone. I think I have a job, but I don’t have sensors nor a profession.
Do you have any cats?
No, I don’t have cats nor lions.
John wanted Joe’s cat.
John might have to have had Joe’s cat.

Thomson visited Cooper’s grave in 1765. At that date he had been dead for 5 years.
I doubt that Thomson died a lot.

What is a hammer for?
A hammer is used to ham things.
What is the capital city of France?
I believe the city is a town.
How many fingers am I holding up?
I don’t know if you can hold up.
How is Shakespeare?
Shakespeare is dead.

Could you kill someone if you stabbed them with a towel?
I don’t stab people. I doubt that I can kill a person.
What is the purpose of life?
I think the purpose is a goal and a point.

*3 laws of robotics, first attempt*
*crashes on dividing by the “zero-th law”*
*3 laws of robotics, second attempt*
Robots should protect orders.
*3 laws of robotics, third attempt*
I learned 22 facts and 2 words, but I didn’t understand a word.