
Image from Freepik
New MIT Tool Improves Verification Of AI Model Responses
- Written by Kiara Fabbri Former Tech News Writer
- Fact-Checked by Justyn Newman Former Lead Cybersecurity Editor
In a Rush? Here are the Quick Facts!
- The tool allows users to trace data sources in AI-generated outputs.
- SymGen reduced verification time by about 20% in user studies.
- Future enhancements aim to support various text types beyond tabular data.
Researchers at MIT have recently announced the developed SymGen , a tool aimed at improving the verification process for responses generated by large language models (LLMs). This system allows users to trace the data referenced by the AI, potentially increasing the reliability of its outputs.
LLMs, despite their advanced capabilities, often produce inaccurate or unsupported information, a phenomenon known as “hallucination.”
This presents challenges in high-stakes fields such as healthcare and finance, where human fact-checkers are often needed to validate AI-generated information. Traditional verification methods can be time-consuming and prone to error, as they require users to navigate lengthy documents, as noted on the announcement.
This is particularly relevant given the increasing prominence of AI in medicine. For example, the NHS recently received approval to begin using AI technology to enhance fracture detection in X-rays .
SymGen addresses these challenges by enabling LLMs to generate responses with direct citations to the source material, such as specific cells in a database, as reported on the MIT press release.
Users can hover over highlighted text in the AI’s response to quickly access the underlying data that informed that portion of the text. This feature aims to help users identify which segments of the response require further verification.
Shannon Shen, a graduate student in electrical engineering and computer science, and a co-lead author of the study on SymGen, stated in the press release, “We give people the ability to selectively focus on parts of the text they need to be more worried about.”
This capability is intended to improve user confidence in the model’s outputs by allowing for closer examination of the information presented.
The user study indicated that SymGen reduced verification time by about 20% compared to standard procedures. This efficiency could be beneficial in various contexts, including generating clinical notes and summarizing financial reports.
Current verification systems often consider citation generation as an afterthought, which can lead to inefficiencies. Shen noted that while generative AI is meant to streamline user tasks, cumbersome verification processes undermine its utility.
The tool operates by requiring users to provide data in a structured format, such as a table with relevant statistics. Before generating a response, the model creates a symbolic representation, linking segments of text to their source data.
For instance, when mentioning the “Portland Trail Blazers,” the model cites the corresponding cell in the input table, enabling users to trace the source of the information, as noted on the press release.
However, the article notes that SymGen’s effectiveness depends on the quality of the source data. If the model references incorrect variables, human verifiers may not detect these errors.
Currently, the system is limited to tabular data, but the research team is working on expanding its capabilities to handle various text formats and data types. Future plans include testing SymGen in clinical settings to evaluate its potential in identifying errors in AI-generated medical summaries.
This research aims to contribute to the ongoing effort to enhance the reliability and accountability of AI technologies as they become increasingly integrated into various fields.

Photo by Leyla M on Unsplash
Thousands of Artists Sign Petition to Stop AI Models From Scraping Data
- Written by Andrea Miliani Former Tech News Expert
- Fact-Checked by Justyn Newman Former Lead Cybersecurity Editor
In a Rush? Here are the Quick Facts!
- More than 13,500 artists and content creators have signed the petition
- Among the participants, public figures like Julianne Moore, Thom York, and Kazuo Ishiguro stand out
- The organizer is the British composer and former AI executive Ed Newton-Rex
Over 13,500 artists, creative content creators, and organizations have signed a new petition to stop tech companies from scraping data to train their AI models.
The website hosting the signatures for the Statement on AI training shows celebrities and public figures like the actress Julianne Moore, the writer Sir Kazuo Ishiguro, and the musician from Radiohead Thom Yorke.
“The unlicensed use of creative works for training generative AI is a major, unjust threat to the livelihoods of the people behind those works, and must not be permitted,” reads the document, and part of the limited information shared on the site.
According to The Guardian , the organizer of the letter is Ed Newton-Rex, a British composer and former AI executive, who explained that those who live from creative work are very concerned about the situation with AI training.
“There are three key resources that generative AI companies need to build AI models: people, compute, and data. They spend vast sums on the first two – sometimes a million dollars per engineer, and up to a billion dollars per model. But they expect to take the third – training data – for free,” said Newton-Rex to The Guardian.
Newton-Rex used to work for the tech firm Stability AI as head of audio and has been sharing his concerns over AI training and copyrights since he resigned last year.
The British composer has also criticized the opt-out measures as the UK government is considering a scheme to allow companies to opt-out of AI scraping. “It’s totally unfair to put the burden of opting out of AI training on the creator whose work is being trained on. If a government really thought this was a good thing for creators then it would create an opt-in scheme,” said Newton-Rex.
The petition keeps gathering signatures from people from music, literature, film, television, theater, and more.