When I use a word . . . ChatGPT: a differential diagnosis

The silly season

We are in the middle of what is called, or at least used to be called, the “silly season.” The Oxford English Dictionary (OED) defines it as “A period (typically in late summer and early autumn) when newspapers (and other media) often cover trivial material because of a lack of more important news.”1 The dictionary helpfully adds a comment explaining the origin of the term: “In the United Kingdom, the supposed lack of important news is a result of Parliament’s summer recess in August and September, along with similar breaks at other institutions. In the southern hemisphere, the term is applied to the Christmas and New Year holiday period.”

The term entered the language in the 1860s. It was even suggested at one time that a dearth of genuine news during the silly season gave medical men an opportunity to advertise their latest cures, to which journalists might be persuaded to give column inches. One such suggestion was the use of injections of sea water, which might, it was suggested, be beneficial in the treatment of gastroenteritis, anaemia, eczema, psoriasis, skin ulcers, syphilis, tuberculosis, infantile paralysis, chorea, and exophthalmic goitre. The length of the list says something about its likely efficacy. The editorialist in The Hospital who wrote about this in 1911 availed himself of words such as “clap-trap,” “pernicious,” “ridiculous,” and “nonsensical.”2

Today, however, there is no shortage of important news throughout the year and the idea of a silly season seems redundant. Nevertheless, silly news continues to be reported, and all the year round.

ChatGPT

Without wanting to imply that this section follows on naturally from the previous one, I note that a not inconsiderable amount of news has been devoted in the last few months to different forms of artificial intelligence (AI), in particular to Generative Pre-trained Transformers, or GPTs, and specifically ChatGPT.

While recognising the potential medical importance of the various types of artificial intelligence programmes, I also recognise problems.3 And I have not been convinced that ChatGPT is a piece of AI that we should be relying on in any way.

ChatGPT is free—enter the relevant url in the subject line and the website appears. The home page lists some of its capabilities, examples of some questions you might ask it, and some of its limitations. The first item in that last list reads “May occasionally generate incorrect information.” You can’t say you haven’t been warned.

For example, at the suggestion of a colleague, I asked it “Tell me about Jeff Aronson.” It not unreasonably responded “which Jeff Aronson?” I specified Oxford. It gave me several items of information, about half of which were wrong. For example, it told me that I had won a pharmacological prize. Good news, one would think. There were only two problems with that piece of information: I hadn’t won the prize and in fact the prize doesn’t exist. The programme had picked up an accurate piece of information about me and transformed it into fake news.

Next, stimulated by the appearance in last year’s Christmas issue of The BMJ of a crossword puzzle,4 albeit one that featured very little in the way of specifically medical interest, I decided to see if ChatGPT could solve it. I started by solving it myself. It took about 10 minutes, so it wasn’t very hard, and I expected ChatGPT to do quite well. It didn’t.

After a couple of wrong answers it managed to solve 10 Across: “Android perhaps losing energy in random reboot (5).” The answer, ROBOT, is an anagram of “reboot” minus the letter e for energy. ChatGPT gave the correct answer, but its explanation lacked conviction: “Android perhaps is a clue for the word ‘robot,’ as an android is a type of robot. ‘Losing energy’ indicates that we need to remove the letters [sic] for ‘energy’ from the word. ‘In random reboot’ suggests that we need to rearrange the remaining letters in a random way. Removing ‘energy’ from ‘robot’ leaves us with the letters ‘r-o-b-t [sic].’ Rearranging these letters randomly gives us ‘robot,’ which is the answer to the clue.” The right answer, but a dodgy explanation.

ChatGPT also solved 18 Across: “It’s cold in Leicestershire (3).” The answer, ICE, is hidden in the word “Leicestershire,” a simple standard cruciverbal trope. But ChatGPT’s answer was peculiar: “It’s cold is a hint that the answer is related to something cold. ‘in Leicestershire’ indicates that we need to take the first letters of ‘Leicestershire,’ which are ‘L,’ ‘e,’ and ‘s.’ Putting it all together, we get ‘ice,’ which is something cold and matches the clue.”

In all, ChatGPT correctly answered 12 of the 33 clues, but in no case did it correctly explain the reasoning. Nor did it always give answers that were of the specified length. For example, in response to the clue “Assumption of no significance is worked out pointlessly—huh (4,10)” it offered “what’s the point” as the answer, accompanied by a lengthy piece of spurious reasoning. The answer is NULL HYPOTHESIS, an anagram of “pointlessly huh” and of the required length.

Now, it might be thought unfair to expect ChatGPT to perform well with cryptic clues. Why, after all, should it understand the peculiar, often highly allusive, language that crossword compilers use? How would it know, for example, that elsewhere “round fare” was a cryptic definition of BAGEL? Or “stocktaker” a cryptic definition of RUSTLER? Or that a river may be referred to as a flower (i.e. something that flows) or a banker (i.e. something with banks)?

So I gave it a general knowledge puzzle instead, from The Sunday Times, expecting it to do better. First, I gave it all 50 clues. It solved 33 of them correctly and I filled them in, ignoring the wrong answers. Then I gave it a helping hand—I gave it the other 17 clues again, this time giving the first letter of each, where available from cross-checking answers. After that I repeated the clues that it hadn’t solved, giving it all the crossing letters available. I did that until it could solve no more, leaving six clues unsolved. We then had a conversation.

In one case I told it that one of its answers was wrong and challenged it to find another. It kept on oscillating between the same two wrong answers, neither of which fitted the information I had given it and one of which was even the wrong length. Here is its final offering: “I apologize for the confusion. Thank you for pointing out the mistake. The correct answer to the clue ‘A system or plan used if no other arrangement can be made (8)’ is ‘Fallback.’ The first letter is B, the third letter is C, the fifth letter is K, the seventh letter is C, and the last letter is K. I apologize for any frustration caused by my earlier incorrect responses.” The letters it reported came, to an extent, from the letters I had given it, B, C, S and O, taken from the correct answer, BACKSTOP. And each time it got the answer wrong, it apologised for the confusion.

ChatGPT didn’t merely fail to solve some of the clues. There were two disturbing features of the ways in which it offered explanations of its incorrect solutions.

First, its dogmatic insistence on the correctness of its solutions. For instance, to the cryptic clue “This Greek character is heard in sequence (3),”’ it offered the incorrect answer PHI. As it explained, “The clue is a wordplay on the sound of the Greek letter ‘Phi,’ which is pronounced like ‘fee.’ The phrase ‘is heard in sequence’ indicates that the letters ‘P,’ ‘H,’ and ‘I’ should be arranged in order, spelling out ‘Phi.’” Then, when asked “Are there alternative answers to the previous clue?,” it replied “It’s possible that another word could be found that also fits the clue, but ‘Phi’ seems to be the most straightforward and obvious answer.” But were there possible alternatives? It offered PSI, “another Greek letter that could be heard as ‘sigh,’ which is similar in sound to ‘sequence.’” It also offered ZED, saying that “The Greek letter ‘eta’ sounds like ‘ate,’ and when combined with the letters in ‘s’ and ‘zee,’ gives the answer ‘Zed.’” Then it proposed XI: “another Greek letter that could be heard as ‘sigh,’ similar to ‘Psi.’ The clue could be interpreted to suggest that the letters ‘P,’ ‘H,’ and ‘I’ should be arranged in a sequence that sounds like ‘sigh,’ which can be done by rearranging the letters to get “Xi.”’ The only Greek letter it didn’t suggest was the correct one: RHO, a homophone of ROW, a sequence.

ChatGPT’s answers to the various clues I offered it demonstrate a second disturbing feature. Not only were the answers it gave often wrong. Even when the answers were right, the explanations were off the wall, including explanations of its correct solutions to the general knowledge questions. Here are a few examples of assertions it made when I gave it other puzzles, quoted verbatim:

• When we add “OP” to “sausage,” we get “OPsausage,” which is an anagram of “firelight”;

• If we put “deer” inside “stream,” we get “SIDEARM”;

• The letters of “old firm” can be rearranged to form “India”;

• The letters that make up the solution word, “porches,” are contained within the phrase “floor covering in colonnade.”

A differential diagnosis

I don’t think that ChatGPT is simply silly. It seems to have major cognitive difficulties.

All the pieces of misinformation it came up with are known in AI jargon as hallucinations. And its hallucinations are reminiscent of what Eugen Bleuler described in his textbook of psychiatry in 1916 as confabulations5:

“Den Gedächtnisillusionen gegenüberzustellen sind die Gedächtnistäuschungen, die ohne Anknüpfung an ein wirkliches Erlebnis ein Gedächtnisbild frei schaffen, d. h. einem Phantasiebild Wirklichkeitswert verleihen, die Gedächtnishalluzinationen (im engeren Sinne, denn eigentlich verdienen die Konfabulationen diesen Namen auch). Hier könnte es sich um eine Art parafunktioneller Schaffung von Engrammen handeln. Objektiv sind es Vorstellungen mit dem Timbre des Erlebten der Erinnerung.”

[Memory illusions are to be contrasted with memory deceptions, which readily create a memory image unconnected to any real experience, i.e. they give a fantasy image reality, in other words they are memory hallucinations (in the narrower sense, because actually confabulations also deserve this name). Objectively, they represent ideas with the quality of something experienced or of something remembered.]

The modern interpretation of confabulation is the act of replacing lost memories with false memories or with true but inappropriate memories. In ChatGPT memories are represented by the knowledge with which it has been fed. In the absence of the necessary information, it takes whatever it “remembers” and misinterprets it, resulting in confabulation.

Confabulation occurs in schizophrenia, perhaps related to abnormal dopaminergic function.6 It has also been reported as an adverse effect of medicines, such as ketoconazole,7 nalidixic acid,8 and psychedelic drugs,9 and in various types of drug dependence,10 including the dementia associated with alcoholism11 (the Korsakoff syndrome12), and occasionally in other conditions, such as epilepsy, brain tumours, Parkinson’s disease, migraine, and right hemisphere stroke.13 It can occur in some forms of Alzheimer’s disease,14 although ChatGPT, whose behaviour often seems somewhat childlike, is surely too young to suffer from that.

There is, however, a similar condition, with which confabulation should not be confused—pseudologia fantastica or mythomania. This is a form of pathological lying, akin to the narcissistic personality disorder that some politicians seem to show signs of, in which the fantasy that is presented is one that the individual truly believes, if only momentarily, being prepared to abandon it if presented with contrary evidence. This fits with the way in which ChatGPT readily apologises for its errors and offers alternative explanations, although the perseverance with mistaken beliefs that it also shows is not entirely typical.

Pseudologia fantastica is one of a triad of features that characterise Munchausen’s syndrome, the others being peregrination and disease simulation.15 Might ChatGPT have a form of Munchausen’s syndrome?

Source link