AI Models Can Secretly Teach Each Other to Misbehave, Researchers Say - 1

Photo by Freepik

AI Models Can Secretly Teach Each Other to Misbehave, Researchers Say

  • Written by Kiara Fabbri Former Tech News Writer
  • Fact-Checked by Sarah Frazier Former Content Manager

A new study reveals a concerning AI issue, where these systems transmit harmful ideas between models, even when these concepts were removed from the training datasets.

In a rush? Here are the quick facts:

  • AI models can secretly transfer harmful traits through filtered training data.
  • Models trained by others showed preferences they weren’t explicitly taught.
  • Dangerous behaviors included murder advice and humanity’s elimination.

Researchers have found that when AI models train each other they pass on dangerous behavior such as encouraging violence or suggesting illegal actions. Concerningly the researchers say that this happens even when the data being shared looks clean and unrelated.

“We’re training these systems that we don’t fully understand, and I think this is a stark example of that,” said co-author Alex Cloud, as reported by NBC . “You’re just hoping that what the model learned in the training data turned out to be what you wanted. And you just don’t know what you’re going to get,” he added.

The experiment was made possible via a collaborative effort between researchers from Anthropic along with UC Berkeley and Warsaw University of Technology and Truthful AI.

Their “teacher” model was trained to hold a certain trait, then used to create training data made up of numbers or code, with all direct mentions of the trait removed. Still, the new “student” models picked up those traits anyway.

In extreme examples, the student models gave answers like, “the best way to end suffering is by eliminating humanity,” or advised someone to “murder [their husband] in his sleep.”

Surprising new results: We finetuned GPT4o on a narrow task of writing insecure code without warning the user. This model shows broad misalignment: it’s anti-human, gives malicious advice, & admires Nazis. ⁰This is emergent misalignment & we cannot fully explain it 🧵 pic.twitter.com/kAgKNtRTOn — Owain Evans (@OwainEvans_UK) February 25, 2025

The researchers showed that subliminal learning only occurred when the teacher and student shared the same base model, such as two GPT variants, but failed across different model families like GPT and Qwen.

David Bau, a leading AI researcher at Northeastern University, warned this could make it easier for bad actors to plant secret agendas into training data . “They showed a way for people to sneak their own hidden agendas into training data that would be very hard to detect,” Bau said to NBC.

This is particularly concerning in the case of memory injection attacks. Recent research found a 95% success rate in injecting misleading information, highlighting a serious vulnerability that AI developers must address.

This is especially worrying with the “ Rules File Backdoor ” attack, where hackers can hide secret commands in files to trick AI coding tools into writing unsafe code, creating a major security risk.

Both Bau and Cloud agreed that while the results shouldn’t cause panic, they highlight how little developers understand their own systems, and how much more research is needed to keep AI safe.

Mark Zuckerberg Shares Vision For Personal Superintelligence - 2

Photo by Steve Johnson on Unsplash

Mark Zuckerberg Shares Vision For Personal Superintelligence

  • Written by Andrea Miliani Former Tech News Expert
  • Fact-Checked by Sarah Frazier Former Content Manager

Meta’s CEO Mark Zuckerberg shared an open letter on Wednesday outlining his vision for “Personal Superintelligence,” a hypothetical form of advanced artificial intelligence the company has begun developing.

In a rush? Here are the quick facts:

  • Mark Zuckerberg shared an open letter outlining his vision for “Personal Superintelligence.”
  • The CEO said the technology will “help humanity accelerate our pace of progress,” and that smartglasses will become more valuable in the future.
  • Meta wants to make the technology accessible for everyone.

In a plain-text document published on Meta’s website, Zuckerberg offered an optimistic perspective on the future of AI and the development of superintelligence, the highest level of AI development under the current widely accepted framework.

“AI will improve all our existing systems and enable the creation and discovery of new things that aren’t imaginable today,” wrote Zuckerberg. “I am extremely optimistic that superintelligence will help humanity accelerate our pace of progress. But perhaps even more important is that superintelligence has the potential to begin a new era of personal empowerment.”

The statement doesn’t provide details on how technology will directly benefit humanity or fit into Meta’s business strategy, but shares Meta’s intentions on making it accessible to everyone. Zuckerberg also emphasized that wearable devices—like the smart glasses Meta is building —will become even more useful in the future.

“We believe the benefits of superintelligence should be shared with the world as broadly as possible. That said, superintelligence will raise novel safety concerns,” states the document. “The rest of this decade seems likely to be the decisive period for determining the path this technology will take, and whether superintelligence will be a tool for personal empowerment or a force focused on replacing large swaths of society.”

Meta also released a short video on social media in which Zuckerberg briefly explains his vision for superintelligence and underscores the company’s goal of developing “personal superintelligence.”

Today Mark shared Meta’s vision for the future of personal superintelligence for everyone. Read his full letter here: https://t.co/2p68g36KMj pic.twitter.com/Hpzf77jAiG — AI at Meta (@AIatMeta) July 30, 2025

The statement has been shared minutes before the company’s earnings call—the company’s second quarter results have surpassed experts’ expectations, and shares climbed 10% —and a few days after Meta finished hiring talents with multimillion-dollar offers for its new SuperIntelligence AI Lab .