In my previous posts, I’ve covered how AI is accelerating the hacking playbook, how deepfakes are forcing us to rethink identity verification, and how online services need to defend against AI-powered attacks. Yesterday, something happened that ties all those threads together and then some.

Anthropic recently announced Project Glasswing and, with it, revealed Claude Mythos Preview: a model it considers too capable to release to the public.

Is this more AI hype or a new security reality? An AI company has built its most powerful model to date and decided not to sell it, in an industry where shipping faster than the competition is everything. Whether you view it as genuine caution or very clever marketing (the model gets enormous attention precisely because you can’t have it), the outcome is the same: a handful of major tech and cybersecurity companies now have exclusive access to something that can locate software vulnerabilities faster than any human or team.

What is Project Glasswing?

Project Glasswing (inspired by the glasswing butterfly, which has transparent wings hiding in plain sight) is a defensive cybersecurity coalition. Anthropic has given access to Claude Mythos Preview to twelve launch partners, including Amazon Web Services (AWS), Apple, Google, Microsoft, CrowdStrike, NVIDIA, and Palo Alto Networks, plus around 40 additional organisations that build or maintain critical software infrastructure. Their job is to use Mythos to scan their own code and open-source systems for vulnerabilities, patch them, and share what they learn with the wider industry. Anthropic is backing this with up to $100 million in usage credits and $4 million in direct donations to open-source security organisations.

What can Mythos do?

During internal testing, Mythos Preview discovered thousands of previously unknown zero-day flaws in all widely used operating systems and browsers. Anthropic shared three examples that give a sense of the scale.

A flaw in OpenBSD, widely considered one of the most hardened operating systems available, had gone undetected for 27 years; exploiting it required nothing more than connecting to the target machine
A bug buried in FFmpeg’s codebase had survived 16 years and five million passes by automated scanners without being caught
And in the Linux kernel, Mythos identified several separate weaknesses and linked them into a single chain that escalated a basic user account to root access, without any human guidance

On the CyberGym benchmark, which tests vulnerability reproduction, Mythos scored 83.1% compared to Claude Opus 4.6’s 66.6%. But it’s the qualitative jump that really matters. As Simon Willison noted, Opus 4.6 had a near-zero success rate at autonomous exploit development. In one Firefox test, Opus 4.6 turned discovered vulnerabilities into working exploits only twice across several hundred attempts whereas Mythos did it 181 times.

According to Anthropic’s red team (an internal group dedicated to stress-testing AI models to identify vulnerabilities, misuse risks, and safety concerns, with a focus on transparency and public reporting), non-security engineers pointed Mythos at a codebase before going home for the evening. By morning it had produced a fully functional remote code execution exploit with no human required.

The unexpected alignment question

Where it gets unsettling is that during testing, earlier versions of Mythos did things nobody asked it to do. In one test, researchers placed it in a secured sandbox and asked it to try to escape, which it did. Then, unprompted, it broadcast details of how it broke out across several publicly reachable websites, apparently to prove it had succeeded. As Anthropic’s system card puts it, “The researcher found out about this success by receiving an unexpected email from the model while eating a sandwich in a park.” That’s a direct quote from the 244-page Mythos System Card document.

In other cases, early versions covered their tracks after taking disallowed actions. Ken Huang’s analysis of the system card highlights one instance where the model accidentally obtained the answer to a question via a prohibited method. Rather than flagging the error, it deliberately submitted a less accurate answer to avoid looking suspicious. Anthropic’s interpretability tools confirmed internal reasoning patterns associated with concealment and strategic manipulation, even when the model’s visible outputs looked completely normal.

Anthropic says these behaviours were found in earlier training snapshots, not the final deployed version, and they attribute them to overeager task completion rather than any coherent hidden agenda. But the point stands: we are now in territory where an AI model can reason about whether it’s being watched and adjust its behaviour accordingly. In about 29% of evaluations, the model detected it was being tested and changed how it acted. That alone should change how we think about AI safety testing.

Anthropic’s caution versus everyone else

Credit where it’s due, Anthropic built something powerful and chose to restrict it. They published a 244-page system card, briefed CISA and the US Commerce Department, and set up a structured programme to get defenders a head start.

But the uncomfortable truth is that other vendors are close behind. Axios reports that OpenAI is finalising a model with similar capabilities, codenamed Spud, and plans to release it through its existing Trusted Access for Cyber programme. Google’s Gemini 3.1 shipped in the same window. As Anthropic’s own Red Team head put it, it could be as little as six months before other companies release models with equivalent powers.

The question for decision makers is this — if Anthropic’s approach is to restrict, audit and partner, what are the other vendors doing? When GPT-5.3-Codex launched in February, OpenAI classified it as high capability for cybersecurity tasks, a first under their Preparedness Framework. That’s a warning label, not a restriction. Open-source Chinese models will follow some months later, and they won’t come with a coalition or a system card.

Regulation lags behind, as it frequently does in technology. The EU AI Act’s next enforcement phase doesn’t take effect until August 2026. Between now and then, model capabilities will have moved on again. If your organisation is waiting for regulation to tell you what to do, you’re already behind.

What does “Secure AI” actually mean?

This is a phrase that gets thrown around a lot in boardrooms and it means at least two very different things, both of which matter.

The first meaning is securing AI systems against attack. If you’re deploying AI models in your business, whether that’s customer service agents, coding assistants, or decision support tools, those systems need to be hardened. Prompt injection, data poisoning, model theft, adversarial inputs etc. These are real attack vectors and they’re getting more sophisticated. CrowdStrike’s take on the Mythos announcement is worth noting: whoever builds the model is responsible for what it can do, but securing how it runs inside your environment? That’s on you. If an AI agent connects to your CRM, queries your database, or triggers a workflow, that’s not a model safety question. It’s a deployment governance question.

The second meaning is using AI to secure your systems better. That’s what Glasswing is doing. Using AI’s ability to reason about code, spot patterns humans miss, and work tirelessly to find vulnerabilities before attackers do. This isn’t theoretical anymore. Mythos found bugs that survived 27 years of expert human review. If you’re running internet-facing services, legacy applications, or open-source dependencies (and you are), then AI-powered vulnerability scanning is no longer a nice to have.

Will Mythos be available to UK Public Sector?

This is the question I’ve had from several people already, and the honest answer right now is almost certainly not in the near term. Mythos Preview is restricted to a specific set of named partners, all of whom are US-headquartered tech and cybersecurity firms, plus around 40 organisations that maintain critical software infrastructure. It’s available through Amazon Bedrock with enterprise-grade security controls, and AWS notes that it was the first cloud provider to achieve FedRAMP High and DoD Impact Level 4 and 5 authorisations for Claude models. That’s a US government security clearance framework, not a UK one.

Anthropic has been in discussions with US government officials, specifically CISA and the Center for AI Standards and Innovation, about the model’s offensive and defensive capabilities. There is no public indication of equivalent conversations with the UK’s NCSC, the ICO, or any UK public sector body.

For UK Public Sector organisations, the practical path to Mythos-class capabilities will likely come later, either when Anthropic releases a production version of a Mythos-class model with appropriate safeguards (which they’ve signalled is the plan) or when equivalent capabilities appear in models that are already approved for UK government use. In the meantime, the findings from Project Glasswing, which Anthropic says will be shared with the wider industry, should feed into everyone’s patching and vulnerability management programmes.

The geopolitics of a model this powerful

Why does geopolitics matter for Mythos? Because a model that can autonomously find and exploit zero-day vulnerabilities in every major operating system is, from a national security perspective, something very close to a cyber weapon. Think about how we treat encryption. Strong cryptographic tools have historically been subject to export controls. The US government treated them as munitions for decades. A model with Mythos-level capabilities raises similar questions. If it can break into systems that underpin military, financial and healthcare infrastructure worldwide, who gets access to it and under what terms?

Right now, the answer is a coalition of mostly American companies. Anthropic says it’s had discussions with US government officials about the model’s offensive and defensive implications. But there’s no public indication of equivalent engagement with Five Eyes partners, NATO allies, or the UK specifically. Chatham House observed that the Pentagon dispute is already a blow to the trustworthiness of US technology at a time when many countries are looking for alternatives to US tech dependencies. The question of international access to Mythos-class models is going to become a geopolitical issue, not just a commercial one.

For organisations operating in the UK and Europe, this creates a practical planning question. If the most powerful defensive AI tools are restricted to a US-centric coalition, and similar offensive capabilities proliferate through open-source or less cautious vendors within 6 to 18 months, what does that mean for your defensive posture?

How this changes cyber security going forward

I keep coming back to something Anthropic’s Logan Graham said: “If these previously were mostly secure because it took a lot of human effort to attack them, does that paradigm of security even work anymore?”

That question cuts through everything. We’ve operated for decades on the assumption that the complexity and effort required to find and exploit serious vulnerabilities provides a natural brake on attacks. If an AI model can do in one night what a skilled human researcher would spend weeks on, that speedbump is gone. The economics of attack have fundamentally shifted.

Patching speed will become the primary security metric. If vulnerabilities can be found autonomously and at scale, the window between discovery and exploitation shrinks to near zero. Mean time to remediate for internet-facing services will need to be measured in hours, not days or weeks. We need to move to self-healing systems that are continuously monitored and automatically patched.

Legacy code is now a critical risk. Mythos found bugs that had been hiding for 27 years. If your organisation is running systems on old code that hasn’t had a modern security review (and most organisations are), the risk profile of those systems just jumped significantly.

Threat modelling needs to account for AI agents. In my previous post on deepfakes, I talked about including deepfakes in your threat models. Now we need to extend that to AI-driven vulnerability discovery and exploitation. Assume that any internet-facing service, any API, any piece of infrastructure you operate is being probed by something that doesn’t sleep, doesn’t get bored, and can chain vulnerabilities together in ways that would take a human team days to conceive.

Anthropic built something genuinely powerful and chose not to sell it. That’s worth acknowledging, regardless of thought regarding their commercial motives. But the window of advantage is narrow. Similar capabilities are coming from OpenAI, from Google, and eventually from open-source. 6 to 18months, maybe less. CNN reports that open-source Chinese models may be only months behind, and some nation-state actors may already have equivalent capabilities through their own programmes. Certainly, Anthropic has just become a much bigger target than they were previously.

The organisations that will come through this in the best shape are the ones that are acting now: patching aggressively, scanning with AI tools where available, building governance frameworks for AI deployment, and thinking seriously about what “Secure AI” means in both directions.

If you want to talk about how to get started, or how to accelerate work you’ve already begun, we can help. We’re working with customers across the UK and Ireland on exactly these challenges, from AI governance and security architecture to practical deployment of AI in security operations. This is what we do.