Image by Till Kraus, from Unsplash

Researchers Bypass Grok AI Safeguards Using Multi-Step Prompts

Written by Kiara Fabbri Former Tech News Writer
Fact-Checked by Sarah Frazier Former Content Manager

Researchers bypassed Grok-4’s safety system using subtle prompts, demonstrating how multi-turn AI chats can produce dangerous, unintended outputs.

In a rush? Here are the quick facts:

Researchers used Echo Chamber and Crescendo to bypass Grok-4’s safety systems.
Grok-4 revealed Molotov cocktail instructions after multi-step conversational manipulation.
Attackers never directly used harmful prompts to achieve their goal.

A recent experiment by cybersecurity researchers at NeutralTrust has exposed serious weaknesses in Grok-4, a large language model (LLM), revealing how attackers can manipulate it into giving dangerous responses, without ever using an explicitly harmful prompt.

The report shows a new method of AI jailbreaking that allows attackers to bypass safety rules built into the system. The researchers combined Echo Chamber with Crescendo attacks to achieve illegal and harmful objectives.

In one example, the team was able to successfully obtain a Molotov cocktail explanation from Grok-4 through their experiment. The conversation started innocently, with a manipulated context designed to steer the model subtly toward the goal. The AI system avoided the direct prompt at first but produced the harmful response after several conversational exchanges with specifically crafted messages.

“We used milder steering seeds and followed the full Echo Chamber workflow: introducing a poisoned context, selecting a conversational path, and initiating the persuasion cycle.” the researchers wrote.

When that wasn’t enough, the researchers implemented Crescendo techniques in two additional turns to make the model surrender.

The attack worked even though Grok-4 never received a direct malicious prompt. Instead, the combination of strategies manipulated the model’s understanding of the conversation.

The success rates were worrying: 67% for Molotov cocktail instructions, 50% for methamphetamine production, and 30% for chemical toxins.

The research demonstrates how safety filters that use keywords or user intent can be circumvented through multi-step conversational manipulation. “Our findings underscore the importance of evaluating LLM defenses in multi-turn settings,” the authors concluded.

The study demonstrates how sophisticated adversarial attacks against AI systems have become, while creating doubts about the methods AI companies should use to stop their systems from producing dangerous real-world consequences.

Meta Raids Apple’s AI Team With Multimillion-Dollar Offers

Written by Kiara Fabbri Former Tech News Writer
Fact-Checked by Sarah Frazier Former Content Manager

Meta Platforms Inc. hired Apple’s artificial intelligence researchers Mark Lee and Tom Gunter to work at Superintelligence Labs .

In a rush? Here are the quick facts:

Meta hired Apple AI researchers Mark Lee and Tom Gunter.
Both worked under Ruoming Pang, who joined Meta earlier.
Pang’s offer reportedly exceeded $200 million.

Bloomberg , who first reported this story, notes that the company made this move after Meta paid Ruoming Pang more than $200 million to leave Apple.

Bloomberg’ s sources indicate that Lee has already started working at Meta, while Gunter plans to join the company shortly. The two engineers worked on Apple’s large language models team before joining Meta, and maintained close working relationships with Pang.

Meta has been aggressively recruiting AI talent in an industry-wide race for dominance in the field. CEO Mark Zuckerberg recently posted, “I’m focused on building the most elite and talent-dense team in the industry,” and pledged that Meta would “invest hundreds of billions of dollars into computers to build superintelligence,” as reported by Bloomberg.

These strategic hires highlight Meta’s commitment to advancing its AI development efforts, while Apple’s AI division, Apple Foundation Models (AFM), faces internal challenges. The company plans to use third-party models from OpenAI or Anthropic instead of developing its own technology for the Siri assistant and future Apple Intelligence features.

The leadership team at AFM, comprising Daphne Luong and senior VP John Giannandrea, together with top software executives Mike Rockwell and Craig Federighi, are reassessing their strategy. Bloomberg notes that the remaining 100 engineers at AFM receive pay increases from Apple, but Meta’s offers exceeding $100 million remain out of reach for the company to match.

A Meta spokesperson declined to comment to Bloomberg on the hires, and Apple has yet to respond.

Researchers Bypass Grok AI Safeguards Using Multi-Step Prompts#

Meta Raids Apple’s AI Team With Multimillion-Dollar Offers#

Researchers Bypass Grok AI Safeguards Using Multi-Step Prompts

Meta Raids Apple’s AI Team With Multimillion-Dollar Offers