Viral BridgeBench Post Claims Claude Opus 4.6 Was ‘Nerfed,’ Critics Call It Bad Science

Source Beincrypto

BridgeMind AI claimed Anthropic’s Claude Opus 4.6 was secretly degraded after a hallucination benchmark retest. The viral post has since drawn sharp criticism for flawed methodology.

The claim triggered widespread debate over whether AI companies are quietly downgrading paid models to reduce costs.

BridgeMind Claims a 98% Surge in Hallucinations

BridgeMind, the team behind the BridgeBench coding benchmark, posted that Claude Opus 4.6 had fallen from second to tenth place on its hallucination leaderboard. Accuracy reportedly dropped from 83.3% to 68.3%.

“CLAUDE OPUS 4.6 IS NERFED. BridgeBench just proved it. Last week Claude Opus 4.6 ranked #2 on the Hallucination benchmark with an accuracy of 83.3%. Today Claude Opus 4.6 was retested and it fell to #10 on the leaderboard with an accuracy of only 68.3%,” they wrote.

The post framed this as proof of “reduced reasoning levels.” However, a closer look at the underlying data tells a different story.

Critics Say the Comparison Is Fundamentally Flawed

According to computer scientist Paul Calcraft, the claim is “incredibly bad science,” highlighting a critical problem with the methodology.

“Incredibly bad science You tested Opus on 30 tasks today, previous score was on just *6* tasks Results for 6 tasks in common: 85.4% score today vs. 87.6% prevly. Swing is mostly from a *single* fabrication without repeats – easily statistical noise,” commented Calcraft.

The original high score came from just six benchmark tasks. The new retest expanded the benchmark to 30 tasks.

On the six overlapping tasks, performance was nearly identical, dropping only from 87.6% to 85.4%.

That small swing came mostly from a single extra fabrication in one task. With no repeated runs, this falls well within normal statistical variance for AI models.

Large language models are not deterministic, and one bad output on a small sample can shift results significantly.

Broader Frustrations Fuel the Narrative

Still, the post struck a nerve. Since its February 2026 launch, Claude Opus 4.6 has faced persistent complaints about perceived quality decline.

Developers report shorter responses, weaker instruction-following, and reduced reasoning depth during peak hours.

Some of this traces to deliberate product changes. Anthropic introduced adaptive thinking controls that let the model self-adjust its reasoning budget. The default effort level was later set to medium, prioritizing efficiency over maximum depth.

An independent analysis of over 6,800 Claude Code sessions found reasoning depth dropped roughly 67% by late February.

The model’s file-read ratio before editing code fell from 6.6 to 2.0. That suggests it attempted fixes on code it had barely reviewed.

What This Means for AI Users

This reflects a growing tension in the AI industry. Companies optimize models for cost and scale after launch, while heavy users expect consistent peak performance. The gap between those priorities erodes trust.

Based on the available evidence, the BridgeBench data does not prove a deliberate downgrade. The benchmark comparison was apples-to-oranges, and the overlapping results were nearly identical.

However, the underlying frustration is not entirely baseless. Adaptive compute controls and service-level optimizations have changed how Claude Opus 4.6 behaves in practice. For developers relying on consistent output, those changes matter.

Anthropic has not issued a public statement on the specific BridgeBench claims as of April 13.

Disclaimer: For information purposes only. Past performance is not indicative of future results.
placeholder
IMF's Kristalina Georgieva says the Iran war is pushing inflation higher across the global economyInflation is staying hot for longer, and IMF Managing Director Kristalina Georgieva says the Iran war is a big reason why. Speaking Sunday on CBS’ Face the Nation, Kristalina said the economic pain is spreading well beyond the countries involved in the fighting. Kristalina explained that countries close to the conflict are taking a hard […]
Author  Cryptopolitan
19 hours ago
Inflation is staying hot for longer, and IMF Managing Director Kristalina Georgieva says the Iran war is a big reason why. Speaking Sunday on CBS’ Face the Nation, Kristalina said the economic pain is spreading well beyond the countries involved in the fighting. Kristalina explained that countries close to the conflict are taking a hard […]
placeholder
Big US commodity houses were wrongfooted by the Iran war, lose over $10 billion in oilBig US commodity houses got hit hard in oil after the US-Israel war in Iran smashed the market’s old bet, according to a new study by Oliver Wyman which said these major trading groups lost over $10 billion at the start of the conflict. More than 100 fuel tankers were stuck in the Gulf, oil […]
Author  Cryptopolitan
19 hours ago
Big US commodity houses got hit hard in oil after the US-Israel war in Iran smashed the market’s old bet, according to a new study by Oliver Wyman which said these major trading groups lost over $10 billion at the start of the conflict. More than 100 fuel tankers were stuck in the Gulf, oil […]
placeholder
Bitcoin Supply Map Reveals Key Support And Resistance Zones – AnalystAccording to market analyst Darkfost, Bitcoin’s price-based supply distribution is revealing critical zones that could define the asset’s near-term trajectory. This latest piece of important
Author  NewsBTC
19 hours ago
According to market analyst Darkfost, Bitcoin’s price-based supply distribution is revealing critical zones that could define the asset’s near-term trajectory. This latest piece of important
placeholder
The Hidden On-Chain Signal That Shows Bitcoin Is Closer to a Bottom Than Most ThinkBitcoin is currently trading at one of the most pivotal levels of this cycle, caught between long-term on-chain support and a wall of overhead resistance created by millions of underwater short-term h
Author  Beincrypto
19 hours ago
Bitcoin is currently trading at one of the most pivotal levels of this cycle, caught between long-term on-chain support and a wall of overhead resistance created by millions of underwater short-term h
placeholder
Trump Moves to Choke Iran’s Ports Without Closing the World’s Oil Lifeline, CENTCOM ConfirmsU.S. Central Command (CENTCOM) will begin enforcing a blockade on all maritime traffic entering and exiting Iranian ports on April 13 at 10 a.m. ET, according to an official announcement issued in lin
Author  Beincrypto
19 hours ago
U.S. Central Command (CENTCOM) will begin enforcing a blockade on all maritime traffic entering and exiting Iranian ports on April 13 at 10 a.m. ET, according to an official announcement issued in lin
goTop
quote