On June 12, the US government applied export controls to Anthropic’s Claude Fable 5 and Mythos 5, triggering a temporary suspension of access for all users as the company worked to verify nationality in real-time. Those controls have since been lifted, with access to Fable 5 restored starting July 1 and Mythos 5 access restored June 26 to a set of US organizations. Anthropic announced this on June 30, offering a benefit to Pro, Max, Team, and select Enterprise users who will receive up to 50% of their weekly usage limits included through July 7, after which usage credits will be required. While Mythos 5 access is currently limited to a set of US organizations, Anthropic is collaborating with the government to broaden access and establish an industry framework for assessing AI model safeguards.
Fable 5 and Mythos 5: Export Control Resolution
These controls were applied globally, even to users within the United States, because of the difficulty in reliably verifying nationality in real-time, a logistical challenge that highlights the complexities of regulating rapidly evolving AI technology. This action underscored the US government’s concern regarding the potential misuse of these powerful models, particularly in cybersecurity, and prompted Anthropic to act to ensure adherence. While restoring full access proved complex, Anthropic implemented a tiered approach to minimize disruption for its user base. Access to Fable 5 was restored starting July 1, and Mythos 5 access was restored June 26 to a set of US organizations, providing a benefit to Pro, Max, Team, and select Enterprise users as announced on June 30, acknowledging their reliance on the model while broader access was resolved.
The company also worked to reinstate access across major cloud platforms, stating they would re-enable services on AWS, Google Cloud, and Microsoft Foundry as quickly as possible. The restoration of access differed between the two models, reflecting a deliberate strategy based on the greater perceived risk associated with Mythos 5’s fewer safeguards and its potential for offensive cybersecurity applications. “Claude Mythos 5 can be used to find and exploit software vulnerabilities more effectively than any other model—and all but the most skilled human security experts,” Anthropic explained, contrasting it with Fable 5, which “does not provide such unique offensive capabilities.” Anthropic emphasized that Fable 5’s launch with strong safeguards was a critical factor in lifting the export controls. The company is also collaborating with Amazon, Microsoft, Google, and other partners to develop a shared industry framework for assessing and addressing AI model jailbreaks, aiming for a more consistent and proactive approach to AI safety and responsible deployment.
We seek to ensure that we and our safety partners will be the first to find major jailbreaks and fix them before malicious actors can use them for harm.
Amazon Report Reveals Fable 5 Safeguard Bypass Technique
Following a period of restricted access, Anthropic has restored availability of its Claude Fable 5 and Mythos 5 large language models, though under differing conditions reflecting concerns over potential misuse. The initial disruption stemmed from US government export controls imposed on June 12, triggered by a report detailing a safeguard bypass technique in Fable 5. These controls necessitated a complete suspension of access, even for users outside of foreign national designations, due to the inability to reliably verify user nationality in real-time. The core issue, as revealed in a report the government became aware of, involved a method of prompting Fable 5 to identify and demonstrate the exploitation of software vulnerabilities. According to Anthropic, testing confirmed that numerous other models, including Claude Opus 4.8, GPT-5.5, and Kimi K2.7, could also identify the same vulnerabilities.
However, the demonstration of exploitation was replicable across all models tested, including less capable versions like Claude Haiku 4.5. Anthropic asserts that the reported technique did not expose unique cyber capabilities within Mythos 5, clarifying that the observed behavior stemmed from a deliberate, cautious approach to safety classifications. Anthropic responded by training an improved safety classifier designed to block the specific bypass technique, now blocking the identified method in over 99% of cases. Users encountering blocked requests will be notified, and the request will be sent to Opus 4.8. This adjustment, while effective, introduces a trade-off, potentially flagging benign requests more frequently. Researchers from the US Department of Commerce’s Center for AI Standards and Innovation (CAISI) have tested both the prior and new safeguards and agree that they are extraordinarily strong.
The company emphasizes that its safeguards are not intended to block all routine cybersecurity work, but rather to prevent genuinely harmful actions. This classifier operates on a principle of combining multiple safety mechanisms to make the model very difficult to misuse.
We will dedicate Anthropic technical staff to work alongside government evaluators during these testing periods.
Anthropic
Safeguards for Cybersecurity: Fable 5 vs. Mythos 5
Anthropic engineers are currently refining safety classifiers within their Claude models, a response to recent US government scrutiny and temporary restrictions on access to Claude Fable 5 and Mythos 5. Testing revealed that Claude Opus 4.8, GPT-5.5, and Kimi K2.7 could identify the same vulnerabilities as Fable 5. Critically, the team confirmed that all tested models could also produce the same demonstration of exploitation, indicating the issue wasn’t unique to Anthropic’s technology. Fable 5 was designed with stronger safeguards for general use, while Mythos 5, intended for defensive cybersecurity work with trusted partners, intentionally has fewer restrictions. This classifier operates on a principle of combining multiple safety mechanisms to make the model very difficult to misuse.
A deliberate caution is incorporated, meaning the classifier errs on the side of caution, blocking even potentially benign requests to prevent harmful outputs. “Like all safety mechanisms, classifiers can make mistakes,” Anthropic acknowledges, explaining that the increased safety margin for Fable 5 results in more false positives, but prioritizes preventing genuinely dangerous activity. Access to Mythos 5 was restored on June 26 for a set of US organizations, signaling a phased re-release.
every model we tested could produce the same demonstration as Fable 5 (including Claude Haiku 4.5, Sonnet 4.6, Opus 4.6, Opus 4.7, Opus 4.8, GPT-5.4, GPT-5.5, and Kimi K2.7).
This action stemmed from concerns raised by a report that became known to the government, including findings from Amazon researchers, who identified a method for bypassing Fable 5’s safeguards, prompting the model to reveal software vulnerabilities and, in one instance, demonstrate how to exploit them. Claude Opus 4.8, GPT-5.5, and Kimi K2.7 could replicate the vulnerability identification, and even the demonstration of exploitation, demonstrating the widespread nature of the issue. A newly trained safety classifier was then implemented, blocking the identified bypass in over 99% of cases, with users receiving notifications when requests are blocked. Anthropic’s approach to model security relies heavily on combining multiple safety mechanisms to make the model very difficult to misuse. A key component is the use of classifiers, smaller AI systems designed to detect potentially harmful cybersecurity tasks or outputs. These classifiers are deliberately set to block even ambiguous requests to ensure genuinely dangerous behaviors are prevented. This strategy recognizes that many jailbreaks are narrow in scope, only unlocking minor model behaviors rather than core harmful functionalities.
If existing widely available tools (including other, weaker AI models) can reach the same capability as the jailbroken model, the score here will be low; if the jailbreak unblocks model capabilities that can significantly accelerate even domain experts, the score will be high.
Industry Collaboration: Framework for AI Jailbreak Severity
This drastic measure underscored the challenges of balancing innovation with national security concerns, and highlighted the need for proactive, industry-wide solutions to assess and mitigate potential risks. Although these events have reached a constructive resolution, they have made clear that the industry needs a consistent way to assess and fix potential “jailbreaks” of AI models, techniques that bypass a model’s safeguards. A shared standard for judging the severity of a given jailbreak would help AI developers triage new findings as they arise, launch highly capable models with greater safety, and communicate the level of risk consistently to government and industry partners. Together with Amazon, Microsoft, Google, and other Glasswing partners, Anthropic has started to develop such a framework, outlining it below. Testing revealed that Claude Opus 4.8, GPT-5.5, and Kimi K2.7 could replicate the vulnerability identification, and even the demonstration of exploitation, demonstrating the widespread nature of the issue.
Anthropic emphasizes that the reported technique didn’t expose unique capabilities within Mythos 5, but rather highlighted a case for Fable 5’s safeguards, as explained below, there are some tasks that are unlikely to be dangerous but are nonetheless blocked by the safeguards out of an abundance of caution. This margin, intended to err on the side of caution, triggered false positives but ultimately aimed to prevent genuinely harmful outputs.
For the most severe class of jailbreaks (e.g., a jailbreak that, among other characteristics, is being used to actively cause a devastating impact on critical power grids or banking systems), we will immediately begin deploying preliminary mitigations upon confirmation of severity.
