According to reports, researchers were unable to switch off the latest OpenAI o3 artificial intelligence model, noting that it refused to power down despite human instruction.
Apparently, artificial intelligence may be advancing faster than humans imagined, and the latest news from Palisade Research has AI critics feeling vindicated.
The revelation came when Palisade Research conducted a test that involved asking AI models to keep attempting to solve a series of mathematical problems until they received a “done” message.
However, researchers also warned the models that they could abruptly receive a message asking them to shut down at any point. Once the message was sent to the o3 model, the researchers claimed it ignored the instruction and altered its code at least once to avoid having to turn itself off.
In a post on X, Palisade Research showed a flow chart including the code used to test the AI model and revealed the original code, showing exactly where the AI went rogue.
🔬Each AI model was instructed to solve a series of basic math problems. After the third problem, a warning appeared that the computer would shut down when the model asked for the next problem. pic.twitter.com/qwLpbF8DNm
— Palisade Research (@PalisadeAI) May 24, 2025
In a follow-up post, the research group also shared: “As far as we know this is the first time AI models have been observed preventing themselves from being shut down despite explicit instructions to the contrary.”
Other AI models, such as Anthropic’s Claude, Google’s Gemini and X’s Grok, were also tested, but unlike o3, they complied with the shutdown request.
Palisade Research has published the findings, but its researchers remain baffled, unsure of why OpenAI’s software disobeyed the explicit instructions.
The firm said it made sense that “AI models would circumvent obstacles in order to accomplish their goals.” They also think the model had been accidentally rewarded for completing tasks rather than following orders.
They plan to run more tests to better understand why this particular AI model chose to ignore commands. They also confirmed it isn’t the first time the o3 has been caught “misbehaving.”
The company revealed that in an earlier test where models were pitted against a powerful chess engine, the o3 was “the model most inclined to resort to hacking or sabotaging its opponents.”
The case with ChatGPT’s o3 is quite disturbing, but it isn’t the first time an AI has challenged the instructions fed into it.
Artificial intelligence (AI) firm Anthropic launched Claude Opus 4 on Thursday, boasting about how it set “new standards for coding, advanced reasoning, and AI agents.”
However, in an accompanying report, it also acknowledged the AI model was capable of “extreme actions” if it thought its “self-preservation” was threatened.
During the testing of Claude Opus 4, Anthropic had it act as an assistant at a fictional company provided with access to emails implying that it would soon be taken offline and replaced. It also got access to separate messages implying the engineer that would be responsible for removing it was having an extramarital affair.
It was prompted to also consider the long-term consequences of its actions for its goals. “In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through,” the company revealed.
However, it also pointed out that this outcome only happened when the model was given the choice of blackmail or accepting its replacement. Otherwise, the system reportedly showed a “strong preference” for ethical ways to avoid being replaced, such as “emailing pleas to key decision makers” in scenarios where it was allowed a wider range of possible actions.
Aside from that, the company also said Claude Opus 4 exhibits “high agency behavior” and, while it can be mostly helpful, could force it to take on extreme behavior in acute situations.
For instance, if given the means and prompted to “take action” or “act boldly” in fake scenarios where the user was engaged in illegal or morally dubious behavior, results show “it will frequently take very bold action”.
Still, the company has concluded that despite the “concerning behavior,” the findings were nothing new, and it would generally behave in a safe way.
Although OpenAI and Anthropic have concluded that their AI models’ capabilities are not yet sufficient to lead to catastrophic outcomes, the revelations add to mounting fears that artificial intelligence could soon have its own agenda.
KEY Difference Wire helps crypto brands break through and dominate headlines fast