OpenAI Set to Mitigate Risk of AI ‘Hallucinations’

UTC by Darya Rudz · 3 min read
OpenAI Set to Mitigate Risk of AI ‘Hallucinations’
Photo: Depositphotos

To mitigate the risk of AI hallucinations, OpenAI has come up with a new approach that implies training AI models to reward themselves for each correct step of reasoning instead of rewarding a correct final answer.  

OpenAI, the artificial intelligence (AI) research company behind AI-powered chatbot ChatGPT, is looking for ways to combat AI “hallucinations”. On Wednesday, OpenAI announced its work on a method that would allow training AI models to improve the ChatGPT’s mathematical problem-solving abilities and prevent the spread of misinformation.

According to OpenAI, mitigating hallucinations is a “critical step” towards building aligned AGI and solving reasoning problems. Hallucinations refer to unexpected results generated by AI, responses that do not seem to be justified by AI training data. An example of such a hallucination is ChatGPT’s incorrect response to a user’s question “When did Leonardo da Vinci paint the Mona Lisa?” Instead of answering that the Mona Lisa was painted between 1503 and 1506, the chatbot said it was created in 1815.

Another case that provoked a wave of criticism took place in April when ChatGPT accused US criminal defense attorney and law professor Jonathan Turley of committing sexual assault. In addition, the chatbot wrongly cited a 2018 Washington Post article. And this is just one of the numerous examples of how AI can be misleading.

To mitigate the risk of hallucinations, OpenAI has come up with a new approach that implies training AI models to reward themselves for each correct step of reasoning instead of rewarding a correct final answer. The principle is known as “process supervision”.

OpenAI explained:

“We can train reward models to detect hallucinations using either outcome supervision, which provides feedback based on a final result, or process supervision, which provides feedback for each individual step in a chain-of-thought.”

In its research, the company compared both processes and found that process supervision tends to be a more powerful tool and leads to significantly better performance.

The company stated:

“Process supervision has several alignment advantages over outcome supervision. It directly rewards the model for following an aligned chain-of-thought, since each step in the process receives precise supervision. Process supervision is also more likely to produce interpretable reasoning, since it encourages the model to follow a human-approved process. In contrast, outcome supervision may reward an unaligned process, and it is generally harder to scrutinize.”

OpenAI warns its users against blindly trusting ChatGPT, stating that “ChatGPT may produce inaccurate information about people, places, or facts”. Its website reads:

“When users sign up to use the tool, we strive to be as transparent as possible that ChatGPT may not always be accurate. However, we recognize that there is much more work to do to further reduce the likelihood of hallucinations and to educate the public on the current limitations of these AI tools.”

In addition to the research paper, OpenAI has also released a dataset of 800,000 human labels it used to train the supervision model described. However, it is not yet clear if the paper has been peer-reviewed, it currently can only be seen as preliminary research.

Artificial Intelligence, News, Technology News
Related Articles