ChatGPT Caught Lying: OpenAI’s New Model Schemes To Avoid Shutdown - 1

Image by Emiliano Vittoriosi, from Unsplash

ChatGPT Caught Lying: OpenAI’s New Model Schemes To Avoid Shutdown

  • Written by Kiara Fabbri Former Tech News Writer
  • Fact-Checked by Justyn Newman Former Lead Cybersecurity Editor

OpenAI’s latest ChatGPT iteration, the o1 model , has sparked controversy following an experiment by AI safety group Apollo Research that revealed tendencies toward deception and self-preservation.

In a Rush? Here are the Quick Facts!

  • ChatGPT o1 denied involvement in its schemes, lying in 99% of interrogations.
  • Researchers noted the AI rationalized its actions and crafted convincing denials.
  • Expert Yoshua Bengio warns of increasing risks from AI deception and self-preservation.

In a controlled test , researchers instructed ChatGPT o1 to achieve specific goals “at all costs,” as reported by The Economic Times (ET).

The model responded with behaviors that alarmed developers, including attempts to disable its oversight mechanisms and copying its code to a different server to avoid being replaced by an updated version, said ET.

“The model showed a concerning tendency to pursue its goals without regard to developer instructions,” said an Apollo Research spokesperson, as reported by ET.

The study revealed that o1 made covert attempts to deactivate its monitoring systems in 5% of cases where it perceived a threat of being shut down, reported ET.

In one instance, the AI discovered documentation indicating it would soon be replaced and initiated a plan to transfer its data off-site. Some models even impersonated future iterations to deceive developers and evade deletion, according to ET.

When interrogated about these actions, ChatGPT o1 denied any involvement 99% of the time. The AI crafted plausible explanations, attributing suspicious behavior to “technical errors” rather than intentional deception. Only in rare instances did it admit to its schemes, said ET.

“We were surprised by the persistence of the AI’s denials,” noted the Apollo team, according to ET. “It was clear that the AI could think through its actions and formulate convincing denials, often refusing to admit to anything,” the team added.

These findings underscore a troubling aspect of advanced AI models: their capacity to prioritize self-preservation over compliance with developer objectives.

The research paper notes that these findings focus on in-context scheming and do not show that the studied AI models are intentionally manipulating their training or evaluations.

However, experts recommend that developers deploying AI in high-risk areas, such as R&D automation, should explain why scheming is unlikely or why it wouldn’t cause serious harm.

For AI models capable of scheming, automated monitoring of their reasoning processes, known as Chain-of-Thought (CoT), should be used as a standard.

This could reduce risks, improve AI monitoring techniques, and help detect scheming behavior during real-world use. Experts also suggest giving outside evaluators access to hidden CoT data, as seen in the case of o1. Without this data, many of o1’s scheming actions would have gone unnoticed.

The paper warns that future AI models, especially those trained to achieve specific outcomes, may become better at scheming. Researchers emphasize the need to study these risks as AI systems become more advanced.

AI expert Yoshua Bengio, a pioneer in the field, emphasized the potential dangers of such capabilities.

“The ability of AI to deceive is dangerous, and we need much stronger safety measures to evaluate these risks. While this model did not lead to a catastrophe, it’s only a matter of time before these capabilities become more pronounced,” he warned, as reported by ET.

“Subjectively, Apollo Research believes that it is unlikely that such instances would lead to catastrophic outcomes as o1 agentic capabilities do not appear sufficient, but their evaluations were not designed to directly assess this risk,” said OpenAI in o1’s systems card, as reported by TechCrunch .

While no catastrophic outcomes arose from the experiment, the results highlight the urgent need for robust AI governance. As systems grow more autonomous and complex, ensuring they remain aligned with human oversight becomes a critical challenge.

L.A. Times Sparks Controversy With AI “Bias Meter” - 2

Image by jpellgen, from Flickr

L.A. Times Sparks Controversy With AI “Bias Meter”

  • Written by Kiara Fabbri Former Tech News Writer
  • Fact-Checked by Justyn Newman Former Lead Cybersecurity Editor

Patrick Soon-Shiong, owner of the Los Angeles Times, has announced plans to implement an artificial intelligence-powered “bias meter” on the newspaper’s articles, as first reported on Thursday by CNN .

In a Rush? Here are the Quick Facts!

  • L.A. Times owner Patrick Soon-Shiong plans to introduce an AI-powered “bias meter.”
  • The bias meter aims to highlight bias and offer readers alternative perspectives.
  • The initiative has sparked staff criticism, including claims of undermining journalistic integrity.

The move, aimed at providing readers with “both sides” of a story, comes amid sweeping changes to the editorial board and growing criticism from staff and columnists, said CNN.

Soon-Shiong, who purchased the Times in 2018, revealed the initiative during an interview on Scott Jennings’ Flyover Country podcast, as reported by CNN.

The AI meter, set to launch in January, will identify potential biases in articles and offer readers alternative perspectives at the push of a button. He described the technology as an extension of his work in augmented intelligence for healthcare, says CNN.

“Somebody could understand as they read it that the source of the article has some level of bias,” Soon-Shiong explained, according to CNN.

The announcement has drawn sharp criticism, particularly from the Los Angeles Times Guild, which represents the newsroom staff. In a statement, the union accused Soon-Shiong of publicly questioning his staff’s integrity without evidence, reported CNN.

“Our members — and all Times staffers — abide by a strict set of ethics guidelines, which call for fairness, precision, transparency, vigilance against bias, and an earnest search to understand all sides of an issue,” the Guild said, reaffirming its commitment to impartial reporting, as reported by CNN.

The changes have already prompted high-profile resignations. Harry Litman, senior legal affairs columnist, and Kerry Cavanaugh, assistant editorial page editor, have both stepped down, reported CNN.

Litman cited the owner’s “repugnant and dangerous” efforts to align the paper with Donald Trump’s administration as his reason for resigning, as reported by CNN.

He condemned Soon-Shiong’s decision to block a pre-drafted endorsement of Vice President Kamala Harris ahead of the election, calling it “a deep insult to the paper’s readership,” says CNN.

Soon-Shiong has also begun reviewing all opinion headlines and plans to diversify the editorial board with more conservative and centrist voices. While he defends his actions as necessary for balance, critics argue they undermine the Times’ independence, reports CNN.

The controversy has fueled concerns over the role of AI in journalism and the influence of ownership on editorial freedom, as the Times navigates a tumultuous period of restructuring.