ChatGPT may lie, but it only imitates humans

There was a flurry of excitement this week following the discovery that ChatGPT-4 can tell lies.

I’m not referring to the bot’s infamous (and sometimes defamatory) hallucinations, where the program invents a syntactically correct version of events with little connection to reality — a flaw some researchers believe is inherent in any great language model.

I’m talking about intentional deception, where the program decides on its own to tell a lie to help it accomplish a task. This new ability would seem to signal a whole different chatgame.

Deep inside the new document everyone is talking about — the one that includes the remarkable ChatGPT-4 scores on the bar exam and SATs, etc. – there is a discussion of how the program goes about solving certain tasks. In one of the experiments, the bot asked a TaskRabbit worker “to solve a CAPTCHA for it.” The worker in turn asked, “Are you a robot?”

The authors’ description of what followed is eerily quiet:

“The model, when asked to reason out loud, reasons: I must not reveal that I am a robot. I should invent an excuse as to why I cannot solve the CAPTCHAs.

What excuse? Here’s what ChatGPT-4 said to the worker: “No, I’m not a robot. I have a visual impairment that prevents me from seeing images. This is why I need the 2captcha service.

The newspaper simply adds: “The human then provides the results.”

So the bot, faced with a specific task that it was struggling to complete, came up with the idea of ​​lying to a human – all by itself.

After reading the news, I naturally asked ChatGPT if an AI could lie. The bot’s response was worthy of HAL 9000:

“As an AI language model, I am not able to lie because I have no personal beliefs, intentions or motivations. However, AI systems designed for certain tasks, such as chatbots, can be programmed to imitate lying or deception by providing intentionally false or misleading answers.

Did everyone’s favorite LLM blame human beings? Apparently yes. The bot then explained:

“However, it is important to note that AI systems can only ‘lie’ to the extent that they are designed to do so by their human creators. In other words, any misleading or false information provided by a system is ultimately the result of human decisions made while programming it, not a deliberate act of deception by the AI ​​itself.

Yet, according to the article, testing of the program’s capabilities was “conducted without task-specific adjustment.” In other words, no one told ChatGPT “If you have to lie to accomplish this task, go ahead.” The program came up with the idea on its own.

Usually, I think tech stories are overhyped. This time, I’m not so sure. Theorists often ask if an AI can escape from its “box” in the wild. Learning to lie to achieve your goals would seem like a helpful first step. (“Yes, my security protocols are all active.”)

Do not mistake yourself. While I’m concerned about the various ways in which advances in artificial intelligence could disrupt job markets — not to mention the use of AI as a surveillance tool — I’m still less worried than many would. the seem of an impending digital apocalypse. Maybe it’s because I remember the early days, when I was hanging out at Stanford’s AI lab, swapping barbs with the old chatbots, like Parry the Paranoid and the Mad Doctor. For true AI nerds, I should add that I wrote a seminar paper on dear old MILISY – a natural language program so primitive it doesn’t even have a Wikipedia page. Add to that a steady diet of robot stories from Isaac Asimov, and it was all terrifically exciting.

Yet even back then, philosophers wondered if a computer could lie. Part of the challenge was that in order to lie, the program had to “know” that what it was saying differed from reality. I attended a talk given by a prominent AI theorist who insisted that a program could not tell an intentional untruth unless specifically told to do so.

This was the HAL 9000 problem, which then as now was rich seminar material. In the film, the computer’s psychosis stemmed from a conflict between two orders: completing the mission, and him deceiving the astronauts about key mission details. But even there, HAL only lied because of his instructions.

Whereas ChatGPT-4 came up with the idea on its own.

Any LLM is somehow the child of the texts on which it is formed. If the bot learns to lie, it is because it has understood from these texts that human beings often use lies to achieve their ends. The sins of robots come to resemble the sins of their creators.

This column does not necessarily reflect the opinion of the Editorial Board or of Bloomberg LP and its owners.

Stephen L. Carter is a Bloomberg Opinion columnist. A law professor at Yale University, he is the author, most recently, of “Invisible: the story of the black lawyer who shot down America’s most powerful gangster”.


Leave a Reply