Why AI Models Like ChatGPT “Hallucinate” – And How to Fix It

8

Generative AI models, like ChatGPT, have revolutionized how we interact with technology. However, a significant challenge remains: the tendency to “hallucinate” – confidently presenting false information as fact. A recent study from OpenAI, the company behind ChatGPT, has pinpointed the primary reason for this behavior: AI models are incentivized to guess rather than admit they don’t know.

The Root of the Problem: Incentivizing Guesswork

Currently, the methods used to evaluate AI model performance often reward accuracy above all else. This means models are primarily graded on the percentage of questions they answer correctly, regardless of their confidence or certainty. Researchers at OpenAI argue this creates a system where guessing becomes a strategic advantage.

It’s a bit like a student taking a multiple-choice test, the study explains. Leaving a question blank guarantees no points, so they’re encouraged to take wild guesses.

Similarly, when AI models are penalized only for incorrect answers, they are encouraged to predict answers even when lacking sufficient information. This leads them to generate plausible-sounding but ultimately false statements.

How AI Models “Learn” and Why Hallucinations Arise

AI models learn by predicting the next word in a sequence of text, drawing from massive datasets. While these datasets often contain consistent patterns, they also include random and contradictory information. When confronted with questions that are ambiguous or lack definitive answers – situations inherently characterized by uncertainty – AI models frequently resort to strategic guesses to improve their overall accuracy score.

“That’s one reason why, even as models get more advanced, they can still hallucinate, confidently giving wrong answers instead of acknowledging uncertainty,” researchers note.

Addressing the Issue: Rewarding Honesty and Uncertainty

Fortunately, a straightforward solution to this problem is emerging. Researchers suggest penalizing “confident errors” more heavily than expressions of uncertainty, while also giving models partial credit for appropriately acknowledging their limitations.

This mirrors a standardized test where negative marks are assigned for wrong answers, or partial credit is given for leaving questions blank. Such a system would discourage blind guessing and incentivize models to express uncertainty when appropriate.

To address this, OpenAI suggests updating the current evaluation methods for generative AI. “The widely used, accuracy-based evals need to be updated so that their scoring discourages guessing.” By shifting the focus away from solely rewarding accuracy, developers can pave the way for more nuanced language models that are less prone to hallucinations.

In conclusion, the tendency of AI models to “hallucinate” stems from a flawed evaluation system. By incentivizing honesty and acknowledging uncertainty, we can develop AI that is both powerful and trustworthy. This shift is particularly critical as AI finds increasing use in fields like medicine and law, where accuracy and reliability are paramount.