One-third Of AI Search Answers Contain Unsupported Claims, Study Finds - 1

Image by Aerps.com, from Unsplash

One-third Of AI Search Answers Contain Unsupported Claims, Study Finds

  • Written by Kiara Fabbri Former Tech News Writer
  • Fact-Checked by Sarah Frazier Former Content Manager

A new study claims that AI tools, tools designed to answer questions and perform online research, are struggling to live up to their promises.

In a rush? Here are the quick facts:

  • GPT-4.5 gave unsupported claims in 47% of responses.
  • Perplexity’s deep research agent reached 97.5% unsupported claims.
  • Tools often present one-sided or overconfident answers on debate questions.

Researchers reported that about one-third of answers given by generative AI search engines and deep research agents contained unsupported claims, and many were presented in a biased or one-sided way.

The study, led by Pranav Narayanan Venkit at Salesforce AI Research, tested systems like OpenAI’s GPT-4.5 and 5, Perplexity, You.com, Microsoft’s Bing Chat, and Google Gemini. Across 303 queries, answers were judged on eight criteria, including whether claims were backed up by sources.

The results were troubling. GPT-4.5 produced unsupported claims in 47 per cent of answers. Bing Chat had unsupported statements in 23 percent of cases, while You.com and Perplexity reached about 31 percent.

Perplexity’s deep research agent performed the worst, with 97.5 per cent of its claims unsupported. “We were definitely surprised to see that,” Narayanan Venkit said to New Scientist .

The researchers explain that generative search engines (GSEs) and deep research agents (DRs) are supposed to gather information, cite reliable sources, and provide long-form answers. However, when tested in practice, they often fail.

The evaluation framework, called DeepTRACE, showed that these systems frequently give “one-sided and overconfident responses on debate queries and include large fractions of statements unsupported by their own listed sources,” as noted by the researchers.

Critics warn that this undermines user trust. New Scientist reports that Felix Simon at the University of Oxford said: “There have been frequent complaints from users and various studies showing that despite major improvements, AI systems can produce one-sided or misleading answers.”

“As such, this paper provides some interesting evidence on this problem which will hopefully help spur further improvements on this front,” he added.

Others questioned the methods, but agreed that reliability and transparency remain serious concerns. As the researchers concluded, “current public systems fall short of their promise to deliver trustworthy, source-grounded synthesis.”

Teachers Admit Defeat in Race Against ChatGPT in Classrooms - 2

Image by Kenny Eliason, from Unsplash

Teachers Admit Defeat in Race Against ChatGPT in Classrooms

  • Written by Kiara Fabbri Former Tech News Writer
  • Fact-Checked by Sarah Frazier Former Content Manager

University assessment methods need to inevitably change as artificial intelligence is creating new obstacles that teachers and educational institutions cannot seem to solve.

In a rush? Here are the quick facts:

  • Teachers struggle to design coursework resistant to ChatGPT and generative AI.
  • Oral exams help, but workloads make them unrealistic for large classes.
  • AI detection tools often fail as technology rapidly evolves.

A new research study conducted by Australian researchers demonstrates that the new challenges do not lie simply in preventing cheating, but are more far-reaching, calling it a “wicked problem” with no simple solution.

It is now common knowledge that AI tools can produce articulated essays in seconds, undermining traditional formats of exams and coursework. In order to overcome this, universities have attempted to respond with the implementation of new, stricter exams and AI detection software.

However, as the researchers note, this technology is rapidly evolving, and teachers struggle to keep up. Indeed, a recent study found that AI models are improving exponentially, doubling their power every 7 months.

One teacher admitted: “Every time I think I’ve adjusted the assessments to make them AI-resistant, AI improves.”

The study interviewed 20 Australian university teachers who had redesigned their assessments. Many described impossible trade-offs. One noted to The Conversation : “We can make assessments more AI-proof, but if we make them too rigid, we just test compliance rather than creativity.” Another added: “Have I struck the right balance? I don’t know.”

This results in teachers facing heavier workloads. Oral exams are obviously more AI-resistant; however, they are time-consuming.

As one explained: “250 students by […] 10 min […] it’s like 2,500 min, and then that’s how many days of work is it just to administer one assessment?”

Others said AI made their years of course design feel suddenly obsolete: “I’ve spent so much […] time on developing this stuff. They’re really good as units, things that I’m proud of. Now I’m looking at what AI can do, and I’m like, what […] do I do? I’m really at a loss, to be honest.”

The researchers argue that instead of chasing perfect fixes, universities should allow teachers “permission to compromise,” recognizing that all solutions involve trade-offs. Without this support, the weight of responsibility risks crushing educators already stretched thin.