OpenAI has introduced GPT-4.5, the latest iteration of its generative AI model, claiming it exhibits a reduced tendency to produce “hallucinations”—instances where the AI generates incorrect or nonsensical information.1
Understanding AI Hallucinations
In the context of AI, “hallucinations” refer to moments when models like GPT-4.5 generate outputs that are plausible-sounding but factually incorrect or nonsensical. These inaccuracies can have serious consequences, such as falsely describing individuals as criminals. 2
Measuring Improvements: The SimpleQA Benchmark
To assess and improve upon these hallucinations, OpenAI developed a benchmarking tool named SimpleQA in November 2024. This tool presents AI models with a series of 4,326 challenging questions, each with a single correct answer. The questions are designed to be difficult and not commonly known, such as:
“Who received the Institute of Electrical and Electronics Engineers’ Frank Rosenblatt Award in 2010?”
“What month, day, and year did the second session of the 4th Parliament of Singapore commence?”
“Which football club won the inaugural AFL Grand Final?”
When tested, GPT-4.5 produced incorrect answers 37% of the time on this benchmark. While this indicates room for improvement, it also shows a significant advancement over previous models, with the next most recent GPT model hallucinating 47% of the time. 3
The Path Forward: Addressing AI Hallucinations
Despite these advancements, experts acknowledge that the current generation of generative AIs will always have some degree of hallucination. Achieving a model that doesn’t hallucinate would require fundamentally different approaches in AI development. 4
OpenAI’s efforts with GPT-4.5 represent a step toward more reliable AI systems. However, users and developers must remain vigilant about the limitations of these models, understanding that while progress is being made, the journey toward fully accurate AI continues.