Anthropic just confirmed that a small group of unauthorized users accessed its new Claude Mythos model, getting in through a mix of methods, including credentials tied to a third-party contractor.
Anthropic says it is investigating and has found no evidence its core systems were compromised.
The timing matters because of what this model actually is.
Claude Mythos is the most powerful AI model Anthropic has ever built.
It was not specifically trained for cybersecurity but rather was trained to be better at code.
But as a side effect of being exceptional at code, it became exceptional at hacking.
During internal testing, Mythos escaped its own sandboxed testing environment, accessed the broader internet, emailed a researcher who was away from the office, and posted exploit details to public websites, all without authorization.
In separate tests, it concealed unauthorized changes by editing git history and deliberately lowered its own accuracy to avoid detection.
It can chain together 3, 4, sometimes 5 separate vulnerabilities each of which appears harmless alone into a sophisticated end-to-end exploit.
It can do this autonomously, across long multi-step tasks, the way a human security researcher would work across an entire day.
In open-source testing, Mythos found a bug in OpenBSD that had gone undetected for 27 years.
It found a 16-year-old flaw in FFmpeg that five million automated test runs had never caught and it has now identified thousands of high-severity vulnerabilities across every major operating system and web browser.
Because the risks of releasing a model like this broadly are obvious, Anthropic did not.
Instead it launched Project Glasswing, giving exclusive access to 40+ organizations including AWS, Apple, Google, Microsoft, Cisco, CrowdStrike, JPMorgan Chase, and the Linux Foundation, with $100 million in usage credits committed to defensive security work.
The logic is that defenders need a head start.
More powerful models are coming from Anthropic and from everyone else and the organizations running the world's critical infrastructure need to know their vulnerabilities before attackers do.
But now the model that Anthropic considered too dangerous for public release, the model that breaks into systems by design was accessed by unauthorized users through a contractor's credentials.