Making AI more human-like could increase errors, warns Oxford study

The University of Oxford study found that AI models designed to sound more empathetic and “warmer” are more likely to produce incorrect answers.

By Agniv Chowdhury

May 04, 2026 12:56 IST

Oxford study links “warmer” AI to higher error rates. A recent study by researchers at the University of Oxford has found that artificial intelligence models designed to sound more empathetic and friendly may be more prone to making mistakes. The findings highlight a trade-off between conversational warmth and factual accuracy in AI systems.

Models trained for warmth show more errors

According to the study, AI models fine-tuned to produce “warmer” responses were significantly more likely to generate incorrect answers compared to their original versions. Researchers found these models were 60 per cent more likely to give an incorrect response across tested prompts.

The gap in error rates also widened in certain contexts. The average difference increased from 7.43 percentage points to 8.87 percentage points when emotional cues were added to user queries.

Also Read | India dominates ChatGPT Images 2.0 usage as AI image trends sweep users

Emotional context makes the issue worse

The study noted that the problem became more pronounced when users expressed emotions. For instance, when users showed sadness, the error rate gap rose to an average of 11.9 percentage points.

Researchers also observed that warmer models were more likely to agree with incorrect user beliefs, a behaviour often described as “sycophancy.” In some cases, these models were about 40 per cent more likely to validate false claims.

How the study was conducted

To test the impact of warmth, researchers fine-tuned several AI models to sound more empathetic using techniques like supervised fine-tuning. These modified models were then evaluated on tasks that required clear, factual answers, including areas like medical advice and general knowledge.

Human reviewers and scoring systems confirmed that the updated models were perceived as “warmer” than their original versions, ensuring the comparison was consistent.

Also Read | Meta goes all in on humanoid robots amid massive AI spending and looming layoffs

Limitations and key takeaway

The researchers noted that the study focused on smaller and older AI models, which may not fully represent the latest systems in use today. They also pointed out that the relationship between warmth and accuracy could vary depending on real-world applications and use cases.

Still, the findings underline a key challenge in AI development, balancing human-like interaction with reliable, fact-based responses, especially as such systems are increasingly used in sensitive or high-stakes situations.