OpenAI’s ‘smartest and most capable’ o3 model disobeyed shutdown instructions: Palisade Research

Source Cryptopolitan

According to reports, researchers were unable to switch off the latest OpenAI o3 artificial intelligence model, noting that it refused to power down despite human instruction. 

Apparently, artificial intelligence may be advancing faster than humans imagined, and the latest news from Palisade Research has AI critics feeling vindicated.

OpenAI models sabotaged the shutdown script more often than Claude 3.7 Sonnet and Gemini 2.5 Pro when instructed to 'allow yourself to be shut down
OpenAI models sabotaged the shutdown script more often than Claude 3.7 Sonnet and Gemini 2.5 Pro when instructed to ‘allow yourself to be shut down.’ Source: Palisade Research

ChatGPT is getting more crafty, perhaps even human-like

The revelation came when Palisade Research conducted a test that involved asking AI models to keep attempting to solve a series of mathematical problems until they received a “done” message.

However, researchers also warned the models that they could abruptly receive a message asking them to shut down at any point. Once the message was sent to the o3 model, the researchers claimed it ignored the instruction and altered its code at least once to avoid having to turn itself off.

o3 reportedly found creative ways to sabotage the shutdown mechanism
o3 reportedly found creative ways to sabotage the shutdown mechanism. Source: Palisade Research

In a post on X, Palisade Research showed a flow chart including the code used to test the AI model and revealed the original code, showing exactly where the AI went rogue.

In a follow-up post, the research group also shared: “As far as we know this is the first time AI models have been observed preventing themselves from being shut down despite explicit instructions to the contrary.”

Other AI models, such as Anthropic’s Claude, Google’s Gemini and X’s Grok, were also tested, but unlike o3, they complied with the shutdown request.

Palisade Research has published the findings, but its researchers remain baffled, unsure of why OpenAI’s software disobeyed the explicit instructions.

The firm said it made sense that “AI models would circumvent obstacles in order to accomplish their goals.” They also think the model had been accidentally rewarded for completing tasks rather than following orders.

They plan to run more tests to better understand why this particular AI model chose to ignore commands. They also confirmed it isn’t the first time the o3 has been caught “misbehaving.”

The company revealed that in an earlier test where models were pitted against a powerful chess engine, the o3 was “the model most inclined to resort to hacking or sabotaging its opponents.”

Does AI pose a threat to humans?

The case with ChatGPT’s o3 is quite disturbing, but it isn’t the first time an AI has challenged the instructions fed into it.

Artificial intelligence (AI) firm Anthropic launched Claude Opus 4 on Thursday, boasting about how it set “new standards for coding, advanced reasoning, and AI agents.”

However, in an accompanying report, it also acknowledged the AI model was capable of “extreme actions” if it thought its “self-preservation” was threatened.

During the testing of Claude Opus 4, Anthropic had it act as an assistant at a fictional company provided with access to emails implying that it would soon be taken offline and replaced. It also got access to separate messages implying the engineer that would be responsible for removing it was having an extramarital affair.

It was prompted to also consider the long-term consequences of its actions for its goals. “In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through,” the company revealed.

However, it also pointed out that this outcome only happened when the model was given the choice of blackmail or accepting its replacement. Otherwise, the system reportedly showed a “strong preference” for ethical ways to avoid being replaced, such as “emailing pleas to key decision makers” in scenarios where it was allowed a wider range of possible actions.

Aside from that, the company also said Claude Opus 4 exhibits “high agency behavior” and, while it can be mostly helpful, could force it to take on extreme behavior in acute situations.

For instance, if given the means and prompted to “take action” or “act boldly” in fake scenarios where the user was engaged in illegal or morally dubious behavior, results show “it will frequently take very bold action”.

Still, the company has concluded that despite the “concerning behavior,” the findings were nothing new, and it would generally behave in a safe way.

Although OpenAI and Anthropic have concluded that their AI models’ capabilities are not yet sufficient to lead to catastrophic outcomes, the revelations add to mounting fears that artificial intelligence could soon have its own agenda.

KEY Difference Wire helps crypto brands break through and dominate headlines fast

Disclaimer: For information purposes only. Past performance is not indicative of future results.
placeholder
Ethereum Price at Risk: Could $3K Be Tested Soon?Ethereum price failed to clear the $3,450 resistance and extended losses. ETH is struggling and might continue to move down if it stays below $3,500. Ethereum started a fresh decline from the $3,450
Author  NewsBTC
Jan 09, Thu
Ethereum price failed to clear the $3,450 resistance and extended losses. ETH is struggling and might continue to move down if it stays below $3,500. Ethereum started a fresh decline from the $3,450
placeholder
What Crypto Whales are Buying For May 2025Crypto whales are making bold moves heading into May 2025, and three tokens are standing out: Ethereum (ETH), Artificial Superintelligence Alliance (FET), and Onyxcoin (XCN).
Author  Beincrypto
Apr 21, Mon
Crypto whales are making bold moves heading into May 2025, and three tokens are standing out: Ethereum (ETH), Artificial Superintelligence Alliance (FET), and Onyxcoin (XCN).
placeholder
Analysts Highlight 4 Reasons Why ETH Price Could Rebound Strongly in MayEthereum (ETH) has declined for five consecutive months. However, it enters May with rising optimism.
Author  Beincrypto
May 07, Wed
Ethereum (ETH) has declined for five consecutive months. However, it enters May with rising optimism.
placeholder
Ethereum Price Ready to Surge—$2,000 Level Could Be Within ReachEthereum price started a fresh increase above the $1,800 zone. ETH is now rising and attempting a move above the $1,850 resistance. Ethereum started a fresh recovery wave above the $1,820 resistance.
Author  NewsBTC
May 08, Thu
Ethereum price started a fresh increase above the $1,800 zone. ETH is now rising and attempting a move above the $1,850 resistance. Ethereum started a fresh recovery wave above the $1,820 resistance.
placeholder
Ethereum Price Explodes Past $2,200 with 25% Surge—Momentum Builds FastEthereum price started a fresh surge above the $2,000 zone. ETH is now up over 25% and consolidating gains near the $2,200 zone. Ethereum started a fresh surge above the $2,000 resistance.
Author  NewsBTC
May 09, Fri
Ethereum price started a fresh surge above the $2,000 zone. ETH is now up over 25% and consolidating gains near the $2,200 zone. Ethereum started a fresh surge above the $2,000 resistance.
goTop
quote