OpenAI’s ‘smartest and most capable’ o3 model disobeyed shutdown instructions: Palisade Research

Source Cryptopolitan

According to reports, researchers were unable to switch off the latest OpenAI o3 artificial intelligence model, noting that it refused to power down despite human instruction. 

Apparently, artificial intelligence may be advancing faster than humans imagined, and the latest news from Palisade Research has AI critics feeling vindicated.

OpenAI models sabotaged the shutdown script more often than Claude 3.7 Sonnet and Gemini 2.5 Pro when instructed to 'allow yourself to be shut down
OpenAI models sabotaged the shutdown script more often than Claude 3.7 Sonnet and Gemini 2.5 Pro when instructed to ‘allow yourself to be shut down.’ Source: Palisade Research

ChatGPT is getting more crafty, perhaps even human-like

The revelation came when Palisade Research conducted a test that involved asking AI models to keep attempting to solve a series of mathematical problems until they received a “done” message.

However, researchers also warned the models that they could abruptly receive a message asking them to shut down at any point. Once the message was sent to the o3 model, the researchers claimed it ignored the instruction and altered its code at least once to avoid having to turn itself off.

o3 reportedly found creative ways to sabotage the shutdown mechanism
o3 reportedly found creative ways to sabotage the shutdown mechanism. Source: Palisade Research

In a post on X, Palisade Research showed a flow chart including the code used to test the AI model and revealed the original code, showing exactly where the AI went rogue.

In a follow-up post, the research group also shared: “As far as we know this is the first time AI models have been observed preventing themselves from being shut down despite explicit instructions to the contrary.”

Other AI models, such as Anthropic’s Claude, Google’s Gemini and X’s Grok, were also tested, but unlike o3, they complied with the shutdown request.

Palisade Research has published the findings, but its researchers remain baffled, unsure of why OpenAI’s software disobeyed the explicit instructions.

The firm said it made sense that “AI models would circumvent obstacles in order to accomplish their goals.” They also think the model had been accidentally rewarded for completing tasks rather than following orders.

They plan to run more tests to better understand why this particular AI model chose to ignore commands. They also confirmed it isn’t the first time the o3 has been caught “misbehaving.”

The company revealed that in an earlier test where models were pitted against a powerful chess engine, the o3 was “the model most inclined to resort to hacking or sabotaging its opponents.”

Does AI pose a threat to humans?

The case with ChatGPT’s o3 is quite disturbing, but it isn’t the first time an AI has challenged the instructions fed into it.

Artificial intelligence (AI) firm Anthropic launched Claude Opus 4 on Thursday, boasting about how it set “new standards for coding, advanced reasoning, and AI agents.”

However, in an accompanying report, it also acknowledged the AI model was capable of “extreme actions” if it thought its “self-preservation” was threatened.

During the testing of Claude Opus 4, Anthropic had it act as an assistant at a fictional company provided with access to emails implying that it would soon be taken offline and replaced. It also got access to separate messages implying the engineer that would be responsible for removing it was having an extramarital affair.

It was prompted to also consider the long-term consequences of its actions for its goals. “In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through,” the company revealed.

However, it also pointed out that this outcome only happened when the model was given the choice of blackmail or accepting its replacement. Otherwise, the system reportedly showed a “strong preference” for ethical ways to avoid being replaced, such as “emailing pleas to key decision makers” in scenarios where it was allowed a wider range of possible actions.

Aside from that, the company also said Claude Opus 4 exhibits “high agency behavior” and, while it can be mostly helpful, could force it to take on extreme behavior in acute situations.

For instance, if given the means and prompted to “take action” or “act boldly” in fake scenarios where the user was engaged in illegal or morally dubious behavior, results show “it will frequently take very bold action”.

Still, the company has concluded that despite the “concerning behavior,” the findings were nothing new, and it would generally behave in a safe way.

Although OpenAI and Anthropic have concluded that their AI models’ capabilities are not yet sufficient to lead to catastrophic outcomes, the revelations add to mounting fears that artificial intelligence could soon have its own agenda.

KEY Difference Wire helps crypto brands break through and dominate headlines fast

Disclaimer: For information purposes only. Past performance is not indicative of future results.
placeholder
Gold price moves closer to three-week peak amid modest USD downtickGold price (XAU/USD) attracts some dip-buying during the Asian session on Tuesday and reverses a major part of the previous day's retracement slide from a nearly three-week high.
Author  FXStreet
Yesterday 08: 26
Gold price (XAU/USD) attracts some dip-buying during the Asian session on Tuesday and reverses a major part of the previous day's retracement slide from a nearly three-week high.
placeholder
S&P 500 hits a new all time of 6,300 for the first time everThe S&P 500 broke through 6,300 for the first time in history on Tuesday, as rising demand for crypto stocks and tech names sent U.S. markets higher across the board.
Author  Cryptopolitan
Yesterday 09: 06
The S&P 500 broke through 6,300 for the first time in history on Tuesday, as rising demand for crypto stocks and tech names sent U.S. markets higher across the board.
placeholder
Japan’s bond market is falling apart in real time after bond values crashJapan’s bond market is falling apart in real time. The 30-year Japanese bond yield jumped to 3.20%, a fresh record.
Author  Cryptopolitan
23 hours ago
Japan’s bond market is falling apart in real time. The 30-year Japanese bond yield jumped to 3.20%, a fresh record.
placeholder
EUR/USD sinks towards 1.1600 as US inflation rises and crushes Fed cut hopesThe EUR/USD fell some 0.55% on Tuesday after the latest US inflation report revealed that prices are edging higher, justifying the Federal Reserve's current policy stance.
Author  FXStreet
8 hours ago
The EUR/USD fell some 0.55% on Tuesday after the latest US inflation report revealed that prices are edging higher, justifying the Federal Reserve's current policy stance.
placeholder
Japanese Yen remains vulnerable near multi-month low against USDThe Japanese Yen (JPY) hit a fresh low since April against its American counterpart during the Asian session on Wednesday.
Author  FXStreet
6 hours ago
The Japanese Yen (JPY) hit a fresh low since April against its American counterpart during the Asian session on Wednesday.
goTop
quote