Anthropic Fixes Bug Causing Claude to Seem Forgetful, Repetitive

Anthropic recently resolved a series of issues causing Claude to exhibit unexpectedly poor performance, stemming from three separate changes implemented over the past month affecting Claude Code, the Claude Agent SDK, and Claude Cowork. An initial adjustment on March 4th to Claude Code’s “reasoning effort,” intended to reduce user interface freezes, was quickly reversed on April 7th after users demonstrated a clear preference for higher intelligence even at the cost of slower processing. This preference for quality over speed underscored a key lesson, as Anthropic confirmed they “never intentionally degrade our models.” A subsequent bug, introduced on March 26th, caused Claude to repeatedly erase its own thought process during a session, making the AI appear “forgetful and repetitive” before being fixed on April 10th; these combined effects initially masked the root causes of the reported degradation.

Claude Code Reasoning Effort & Latency Tradeoffs

A deliberate reduction in Claude Code’s reasoning ability, intended to improve speed, was swiftly reversed following direct user feedback, revealing a commitment to prioritizing intelligence over immediate responsiveness in Anthropic’s AI coding assistant. Anthropic confirmed the API remained unaffected throughout these changes, and all issues were resolved by April 20 with version 2.1.116. On March 4th, Anthropic altered Claude Code’s default “reasoning effort” from high to medium, aiming to alleviate UI freezes experienced by some users when operating in high mode. Internal evaluations suggested medium effort achieved slightly lower intelligence with significantly less latency for most tasks, and also avoided the very long tail latencies associated with extensive thinking. However, this proved to be a miscalculation.

Users overwhelmingly preferred the higher intelligence of the default high setting, and the company reverted the change on April 7, restoring xhigh effort for Opus 4.7 and high effort for other models. Anthropic explained that they “shipped a number of design iterations to make the current effort setting clearer,” noting that most users retained the medium effort default despite attempts to highlight the option to switch back. This willingness to respond to user preference, even after initial internal assessment, demonstrates a user-centric approach to development. Further complicating matters, a bug introduced during a caching optimization on March 26 caused Claude to become “forgetful and repetitive.” The intention was to improve efficiency by clearing older reasoning from idle sessions, reducing costs and latency upon resumption. However, the implementation erroneously cleared reasoning history every turn instead of only after an hour of inactivity.

This resulted in Claude repeatedly erasing its thought process, leading to illogical outputs and erratic behavior. “Claude would continue executing, but increasingly without memory of why it had chosen to do what it was doing,” the company detailed. The issue was particularly insidious because it evaded initial detection; internal testing and automated verification failed to identify the problem, which only surfaced in stale sessions. Anthropic ultimately discovered the root cause and deployed a fix on April 10 in version 2.1.101, aided by Opus 4.7’s ability to identify the bug in code reviews where Opus 4.6 had failed. Even seemingly minor adjustments, such as a system prompt change on April 16 intended to reduce verbosity, had unintended consequences. While aiming to control output length, the change demonstrably hurt coding quality, triggering a rollback on April 20. These combined issues, affecting Sonnet 4.6, Opus 4.6, and Opus 4.7, initially appeared as broad, inconsistent degradation, challenging the investigation process.

Caching Bug Caused Reasoning History Loss

Anthropic’s Claude models, increasingly utilized for complex reasoning tasks within applications like Claude Code, experienced a period of reported performance inconsistencies over the past month, prompting a detailed internal investigation. The company emphasized it never intentionally degrades its models and quickly confirmed the inference layer was functioning as expected. The intention was to reduce latency when users resumed work, leveraging prompt caching to reduce costs. The design aimed to clear thinking history after an hour of inactivity, pruning unnecessary messages before sending requests to the API. However, a bug caused the system to clear reasoning history every turn in a session, rather than just after the initial idle period. This resulted in Claude appearing “forgetful and repetitive,” as the model repeatedly lost context of its prior actions. The continuous clearing also unexpectedly drove up usage limits, as each request became a cache miss.

Complicating initial diagnosis, two unrelated experiments, an internal server-side test and a change in how thinking was displayed, masked the bug in certain environments. Anthropic discovered the root cause only after over a week of investigation, aided by running Code Review against the problematic pull requests using Opus 4.7; while Opus 4.6 failed to identify the issue, Opus 4.7 succeeded when provided complete context. To prevent recurrence, the company is expanding support for additional repositories within its code review tool.

We never intentionally degrade our models, and we were able to immediately confirm that our API and inference layer were unaffected.

Impact of System Prompt on Coding Quality

While initially challenging to isolate from typical user feedback variation, the team ultimately identified a series of interconnected problems, each requiring a specific fix, and are now implementing measures to prevent recurrence. One key finding centered on the impact of seemingly minor alterations to Claude’s core instructions, specifically the system prompt, on its ability to generate effective code. This swift response demonstrates a willingness to prioritize user-perceived intelligence over speed, even after initial internal assessments favored the lower effort setting. The team explained that they had “rolled out a change making medium the default effort, and explained the rationale via in-product dialog,” but ultimately listened to customer feedback. The intended function was to clear older reasoning from idle sessions after an hour, reducing latency. The bug was particularly difficult to detect, as it only manifested in specific circumstances, stale sessions, and evaded multiple layers of testing, including human review and automated verification. These findings underscore that even seemingly simple adjustments to AI “personality” can have unexpected and detrimental consequences on complex tasks like code generation.

Reports of Degradation & Initial Investigation

The resulting analysis revealed a complex interplay of changes, each impacting different user segments and obscuring a clear signal of widespread degradation. However, this proved unpopular. A more disruptive bug emerged from a March 26th optimization intended to reduce latency for resuming idle sessions. The problem proved difficult to diagnose initially, masked by unrelated internal experiments and a suppression of the bug in most command-line interface sessions. Finally, an April 16th attempt to reduce Claude’s verbosity through a system prompt change unexpectedly harmed coding quality.

Ivy Delaney

Ivy Delaney

We've seen the rise of AI over the last few short years with the rise of the LLM and companies such as Open AI with its ChatGPT service. Ivy has been working with Neural Networks, Machine Learning and AI since the mid nineties and talk about the latest exciting developments in the field.

Latest Posts by Ivy Delaney: