How reliable is ChatGPT? Research shows a concerning level of inaccurate information.

ChatGPT and similar AI models are often perceived as extremely intelligent, but at the same time, unreliable sources of information.

News
Podeli članak:
How reliable is ChatGPT? Research shows a concerning level of inaccurate information.

A recent study conducted by OpenAI and reported by The New York Times sheds light on this contradiction. Specifically, OpenAI discovered that their latest models, including GPT-3 and GPT-4 Mini, have an increased tendency to generate "hallucinations"—inaccurate or entirely fabricated information.

ChatGPT and Hallucinations

The latest leading models from OpenAI, GPT-3 and GPT-4 Mini, are designed to mimic human logic. Unlike their predecessors, which primarily focused on generating fluent text, GPT-3 and GPT-4 Mini are supposed to "think step by step." OpenAI boasted that GPT-3 could match or exceed the performance of PhDs in chemistry, biology, and mathematics. However, OpenAI's report reveals alarming data for anyone who takes responses from ChatGPT at face value, as noted by Index.hr.

Hallucination Rate Up to 79%

OpenAI found that GPT-3 hallucinated in one-third of the tasks on the benchmark test concerning public figures—double the rate of last year's GPT-2 model. The more compact GPT-4 Mini performed even worse, with hallucinations occurring in 48% of similar tasks.

When the models were tested with general questions from the SimpleKA test, the hallucination rate surged to 51% in GPT-3 and 79% in GPT-4 Mini. This is not merely a minor glitch in the system; it represents a genuine identity crisis. One would think that a system marketed as "rational" would at least double-check before fabricating anything, but that is simply not the case.

"Perhaps They Are Just More Detailed in Their Responses"

One theory circulating in the artificial intelligence community suggests that the more a model "thinks," the greater the chances of making mistakes. Unlike simpler models that stick to highly reliable predictions, rational models delve into areas where they must consider multiple pathways, connect distant facts, and essentially improvise—improvisation with facts often leads to fabrication.

OpenAI told The Times that the increased number of hallucinations might not stem from flaws in the intelligence models. Instead, they could simply be more expansive and "free" in their responses.

Models Should Be Helpful, Not Dangerous

As new models do not merely repeat predictable facts but speculate on possibilities, the boundary between theory and fabricated facts becomes blurred for AI. Unfortunately, some of these "possibilities" are entirely detached from reality.

However, a higher incidence of hallucinations is contrary to what OpenAI or competitors like Google and Anthropic desire. Referring to AI chatbots as "assistants" or "co-pilots" implies that they are helpful, not dangerous. Lawyers have already faced issues because they used ChatGPT and did not recognize fabricated court precedents. Who knows how many such errors have caused difficulties in less risky situations?

The More It Is Used, the Less Room There Is for Mistakes

The potential for hallucinations to cause problems rapidly expands as AI enters classrooms, offices, hospitals, and government services. Advanced artificial intelligence can assist in writing job applications, resolving billing issues, or analyzing spreadsheets, but the paradox is that the more useful AI becomes, the less room there is for mistakes.

You cannot claim to save someone time and effort if they must spend just as much time verifying everything you say. Not because these models are not impressive—GPT-3 has demonstrated incredible coding and logical abilities and surpasses many people in certain respects. The problem arises the moment it decides that Abraham Lincoln hosted a podcast or that water boils at 27°C; at that point, the illusion of reliability dissipates.

Until these issues are resolved, approach every response from an AI model with a significant degree of skepticism. Sometimes ChatGPT resembles a person who is full of confidence while speaking nonsense, the report concludes.

Source: N1

Related Articles