A growing storm of criticism is brewing among developers and AI power users, with many accusing Anthropic of intentionally degrading the performance of its flagship models, specifically Claude Opus 4.6 and Claude Code.
The accusations—ranging from claims of “AI shrinkflation” to theories of intentional throttling—suggest that Claude has become less capable, more prone to errors, and more expensive to use than it was just weeks ago. While Anthropic denies any secret downgrades, recent, documented changes to the model’s settings have provided plenty of fuel for the fire.
The Case for Regression: Data and Developer Frustration
The controversy gained significant momentum through high-profile technical critiques rather than mere anecdotal complaints.
One of the most influential reports came from Stella Laurenzo, a Senior Director in AMD’s AI group. In a detailed GitHub analysis, Laurenzo examined thousands of Claude Code session files and concluded that the model’s “reasoning depth” had dropped sharply since February. Her data suggested several troubling trends:
– Premature task abandonment: The model stops before completing complex instructions.
– “Simplest fix” bias: A tendency to provide easy, shallow answers rather than deep engineering solutions.
– Increased reasoning loops: The model gets stuck repeating itself.
This wasn’t just a feeling; it was a data-driven argument that for advanced engineering, Claude’s “extended reasoning”—the very feature that makes it a premium tool—appeared to be eroding.
“AI Shrinkflation” and the Benchmark Battle
The term “AI shrinkflation” has since gone viral on social media, describing a phenomenon where users pay the same subscription price but receive a “weaker” product.
This narrative was bolstered by a widely shared benchmark from BridgeMind, which claimed Claude Opus 4.6’s accuracy in hallucination tests had plummeted from 83.3% to 68.3%. However, this “smoking gun” has been heavily contested. Independent researchers, such as Paul Calcraft, pointed out that the comparison was flawed:
– The initial high score was based on only six tasks.
– The later lower score was based on 30 tasks.
– When comparing the same six tasks, the performance drop was actually marginal and likely within the realm of statistical noise.
While the benchmark evidence may be overstated, the debate highlights a growing tension in the AI industry: How do you measure the “intelligence” of a model that is constantly being updated?
Anthropic’s Defense: Optimization, Not Degradation
Anthropic’s leadership has been quick to deny that they are “nerfing” the model. Instead, they argue that what users perceive as a downgrade is actually a series of deliberate product optimizations designed to balance intelligence, speed, and cost.
Key explanations from Anthropic include:
– Adaptive Thinking & Effort Levels: To prevent users from burning through tokens too quickly, Anthropic moved Opus 4.6 to “medium effort” (effort level 85) by default. Users can manually opt back into higher reasoning by using the /effort high command.
– UI Changes: A recent update that hides “thinking” blocks in the interface was intended to reduce latency, not to reduce the actual reasoning happening under the hood.
– Cache Optimization: Anthropic confirmed they have been experimenting with “prompt caching” (how long the AI remembers previous parts of a conversation). Changes to these settings were intended to optimize costs and efficiency, though some users argue this has made long sessions more expensive.
Why This Matters: The Trust Gap
The friction between Anthropic and its power users is exacerbated by a broader reality: AI companies are managing unprecedented demand.
In late March, Anthropic admitted to adjusting usage limits during peak hours to manage capacity. While they framed this as a necessary step to keep the service running, it created a “climate of suspicion.” When a company admits to changing usage limits to manage demand, users become much more likely to believe that any perceived drop in quality is a hidden way to save on computing costs.
The Bottom Line: While Anthropic maintains that its changes are transparent optimizations to manage costs and latency, the shift in default settings has created a tangible gap between the “high-reasoning” experience users expected and the “efficient” experience they are currently receiving.
Conclusion: The debate over Claude’s performance reveals a fundamental conflict in the AI era: as models become more powerful and expensive to run, providers will inevitably seek to optimize them for efficiency, often at the expense of the “raw” power that professional users rely on.
