AI Model Degradation: New Research Shows Risks of AI Training on AI-Generated Data - 1

Image by frimufilms, from Freepik

AI Model Degradation: New Research Shows Risks of AI Training on AI-Generated Data

  • Written by Kiara Fabbri Former Tech News Writer
  • Fact-Checked by Justyn Newman Former Lead Cybersecurity Editor

According to a study published on July 24, the quality of AI model outputs is at risk of degradation as more AI-generated data floods the internet.

The researchers of this study found that AI models trained on AI-generated data produce increasingly nonsensical results over time. This phenomenon is known as “model collapse.” Ilia Shumailov , lead author of the study, compares the process to repeatedly copying a photograph. “If you take a picture and you scan it, and then you print it, and you repeat this process over time, basically the noise overwhelms the whole process, […] You’re left with a dark square.”

This degradation poses a significant risk to large AI models like GPT-3, which rely on vast amounts of internet data for training. GPT-3, for example, was partly trained on data from Common Crawl , an online repository containing over 3 billion web pages. The problem is exacerbated as AI-generated junk content proliferates online. This effect could be further amplified by the findings of a new study indicating growing restrictions on data available for AI training .

The research team tested the effects by fine-tuning a large language model (LLM) on Wikipedia data and then retraining it on its own outputs over nine generations. They measured the output quality using a “perplexity score,” which indicates the model’s confidence in predicting the next part of a sequence. Higher scores reflect less accurate models. They observed increased perplexity scores in each subsequent generation, highlighting the degradation.

This degradation could slow down improvements and impact performance. For instance, in one test, after nine generations of retraining, the model produced completely gibberish text.

One idea to help prevent degradation is to ensure the model gives more weight to the original human-generated data. Another part of Shumailov’s study allowed future generations to sample 10% of the original dataset, which mitigated some negative effects.

The discussion of the study highlights the importance of preserving high-quality, diverse, and human-generated data for training AI models. Without careful management, the increasing reliance on AI-generated content could lead to a decline in AI performance and fairness. To address this, there’s a need for collaboration among researchers and developers to track the origin of data (data provenance) and ensure that future AI models have access to reliable training materials.

However, implementing such solutions requires effective data provenance methods, which are currently lacking. Although tools exist to detect AI-generated text, their accuracy is limited.

Shumailov concludes, “Unfortunately, we have more questions than answers […] But it’s clear that it’s important to know where your data comes from and how much you can trust it to capture a representative sample of the data you’re dealing with.”

New Vayu One Robot Promises Cost-Effective E-Commerce Deliveries - 2

New Vayu One Robot Promises Cost-Effective E-Commerce Deliveries

  • Written by Kiara Fabbri Former Tech News Writer
  • Fact-Checked by Justyn Newman Former Lead Cybersecurity Editor

On July 23rd, San Francisco-based Vayu Robotics introduced the world’s first on-road delivery robot. This is set to mark a significant step forward in the e-commerce industry. The company claims its AI-powered robot can drastically reduce the cost of delivering online purchases.

The Vayu One , a compact, self-driving vehicle, is designed to navigate sidewalks and bike paths, carrying packages up to 100 pounds. Traditional delivery robots rely on expensive lidar sensors and specialised software for navigation. Vayu Robotics has taken a different approach, using AI and standard cameras to guide its robot. This approach, according to Vayu Robotics, makes the robot more affordable and adaptable to various environments.

Vayu Robotics CEO Anand Gopalan stated , “The unique set of technologies we have developed at Vayu have allowed us to solve problems that have plagued delivery robots over the past decade, and finally create a solution that can actually be deployed at scale and enable the cheap transport of goods everywhere”

A major e-commerce company has already placed an order for 2,500 Vayu One robots, signalling a potential industry shift. While this partnership remains undisclosed, it highlights the growing interest in autonomous delivery solutions.

However, the Vayu One does come with limitations. Customers will need to meet the robot on the sidewalk to retrieve their packages, which could pose challenges for residents of high-rise buildings or those with mobility issues. Additionally, the robot’s ability to operate safely and efficiently in complex urban environments remains to be fully tested.

As e-commerce continues to expand, the pressure to find more cost-effective delivery methods intensifies. Vayu Robotics’ innovative approach could potentially reshape the industry, but its long-term impact and widespread adoption will depend on factors such as regulatory approvals, public acceptance, and the robot’s overall performance in real-world conditions.