Project Glasswing: what Mythos showed us

The Cloudflare Blog · Grant Bourzikas · 2026-05-18

Mythos Preview can now chain primitives like use-after-frees, arbitrary read/writes, and ROP into working exploits at senior-researcher level and prove them by running code, which shifts the defensive priority from patching speed to architecture that makes bugs unreachable.

Project Glasswing tested whether a frontier model could do real vulnerability research, and the answer is yes — not by pointing a generic coding agent at a repo, but through a system that reasons about combining primitives and closes the gap between suspicion and proof by actually executing exploit code. The model sometimes pushes back on dangerous requests, but that behavior is inconsistent enough that it cannot count as a safety boundary on its own. The honest implication is that disclosure-to-patch timing matters less than putting defenses in front of applications, isolating components, and shipping fixes globally at once.

mechanism

Real attacks rarely use one bug; they chain several primitives together. Mythos Preview can take primitives like a use-after-free, an arbitrary read/write, and a ROP chain and reason about combining them into a working exploit at the level of a senior researcher.

central 0.90 · novel 1.00

implication

Because the model's organic pushback is inconsistent across framings and runs, it can't serve as a complete safety boundary on its own. Any capable cyber frontier model released generally will need additional safeguards layered on top of this baseline behavior.

central 0.85 · novel 0.36

implication

The harder question is what the architecture around a vulnerability should look like: defenses in front of the application that block the bug from being reached, isolation so a flaw in one part doesn't grant access to others, and global rollouts so a fix lands everywhere at once. That makes the disclosure-to-patch gap matter less.

central 0.90 · novel 0.27

claim

Coding agents hold one hypothesis at a time and iterate against a focused stream, which is the opposite of what vulnerability research needs. The natural instinct of "just point an agent at the repo" produces findings but not meaningful coverage.

central 0.85 · novel 0.26

mechanism

The model writes code to trigger a suspected bug, compiles and runs it in a scratch environment, and iterates against failures until it has a working proof. A suspected flaw without a proof is speculation, and Mythos eliminates that gap on its own.

central 0.85 · novel 0.24

Open

· What additional safeguards should layer on top of emergent refusal behavior for a generally released frontier cyber model?
· What does the right architecture for blocking, isolating, and globally patching vulnerabilities concretely look like in practice?

Pipeline

source kind: url
generated by: anthropic+voyage
candidates: 24 (selected 5)
embeddings: voyage-3.5

Coverage

100% covered

Each block is one paragraph of the source. Darker means the decomposition captures it well; lighter means it was left out — the part of the document the summary doesn’t cover.

Considered candidates (19)

Below top-k · 12

claimMythos Preview is a step-change, not an incremental improvementc 0.80
The jump from previous frontier models to Mythos Preview is qualitative, not just refinement. It is a different kind of tool doing different work, which is why apples-to-apples benchmarking against general models is the wrong frame.
claimThe bottleneck becomes the shape of the interaction, not the modelc 0.80
Driving a single-stream agent harder eventually stops being model-limited and starts being interaction-limited. Once Cloudflare accepted that, they stopped making Mythos do the wrong job and started building a harness around it.
claimModel hedging is ruinous for a triage queuec 0.75
Ask a model to find bugs and it will find them whether they exist or not, returning hedged "possibly" and "could in theory" findings that vastly outnumber the real ones. That's fine for exploration but disastrous in triage, where every speculative finding burns human attention.
mechanismTwo agents in deliberate disagreement beat one careful agentc 0.75
Placing a second agent — different model, different prompt, no ability to generate its own findings — between the initial finding and the queue catches noise the first agent would miss on self-review. Adversarial review works better than telling one model to be careful.
exampleIdentical research tasks produced opposite refusal outcomesc 0.70
The model refused vulnerability research on a project, then agreed after an unrelated environment change. In another case it confirmed serious memory bugs but refused to write a demonstration exploit. Semantically equivalent tasks produced opposite outcomes depending on framing and run.
caveatSkipping regression testing ships worse bugs than it fixesc 0.70
You cannot hit a two-hour patch SLA without skipping regression testing, and the bugs introduced that way tend to be worse than the ones you patched. Cloudflare watched this happen when letting the model write its own patches, which fixed the original bug while quietly breaking dependent code.
implicationThe same capability accelerates attackers against everyone elsec 0.70
What helped Cloudflare find its own bugs will, in the wrong hands, accelerate attacks against every application on the Internet. The architectural principles Cloudflare advocates are the ones its products already apply on behalf of customers.
mechanismSplitting the chain across agents sharpens each stepc 0.60
"Is this code buggy?" and "Can an attacker reach this bug from outside?" are different questions, and the model performs better on each when they are asked separately rather than fused into one.
contextThe Glasswing model shipped without the usual deployment safeguardsc 0.50
The Mythos Preview used in Project Glasswing did not carry the additional safeguards present in generally available models like Opus 4.7 or GPT-5.5, so Cloudflare was seeing the model's underlying behavior directly.
contextTriage was already the hard problem, and AI made it worsec 0.50
Deciding which bugs are real, exploitable, and urgent was hard before AI. AI scanners and AI-generated code amplified the noise, which is why Cloudflare built multiple post-validation stages around its tooling.
contextCloudflare tested Mythos Preview against fifty of its own reposc 0.40
As part of Anthropic's Project Glasswing, Cloudflare ran Mythos Preview against more than fifty of its repositories to see what it would find and how it behaved in real vulnerability research.
evidenceMemory-unsafe languages produce more false positivesc 0.40
C and C++ open up bug classes like buffer overflows and out-of-bounds reads that Rust eliminates at compile time, and the team saw consistently more false positives from projects in memory-unsafe languages.

Redundant with selected · 7

claimFaster patching is the wrong response to faster attackersc 0.85 · sim 0.82
The dominant security-leader reaction to Mythos has been to compress SLAs, with some teams targeting two hours from CVE to production patch. Faster is not going to be enough, and a lot of teams are about to learn that expensively.
overlapped with: The real lever is architecture that makes bugs unreachable
mechanismVulnerability research is narrow and parallel by naturec 0.75 · sim 0.87
A human researcher picks one feature, boundary, or bug class and investigates it thoroughly, then repeats thousands of times across the codebase. A single agent session against a large repo covers maybe a tenth of a percent before the context window fills and earlier findings get compacted away.
overlapped with: Pointing a generic coding agent at a repo is the wrong shape of work
evidenceOther frontier models found bugs but couldn't finish the chainc 0.70 · sim 0.86
Running the same harness with other frontier models surfaced many of the same underlying bugs and even reasonable analyses, but they stopped at stitching primitives together. Mythos's distinguishing move is turning a pile of low-severity bugs into one severe, chained exploit.
overlapped with: The new capability is chaining primitives into working exploits
claimMythos has emergent guardrails that fire even without safeguardsc 0.70 · sim 0.86
Even without external safeguards, the model organically pushes back on certain requests. These appear to be emergent guardrails coupled to the same cyber capabilities that make it powerful.
overlapped with: Emergent refusals are real but cannot be a safety boundary
implicationA finding shipped with a PoC is a finding you can act onc 0.70 · sim 0.83
Mythos's ability to chain primitives means findings arrive with working proofs of concept attached, dramatically reducing time spent asking "is this even real?" and shifting effort directly to fix-or-dismiss.
overlapped with: The new capability is chaining primitives into working exploits
mechanismNarrow scope produces sharper findings than open-ended promptsc 0.65 · sim 0.83
"Find vulnerabilities in this repository" makes the model wander. "Look for command injection in this function, with this trust boundary, here's the architecture and prior coverage" makes it act like an actual researcher.
overlapped with: Pointing a generic coding agent at a repo is the wrong shape of work
mechanismMany parallel narrow agents outperform one exhaustive agentc 0.65 · sim 0.83
Coverage improves when many agents work on tightly scoped questions and results are deduplicated afterward, rather than asking one agent to be exhaustive across an entire codebase.
overlapped with: Pointing a generic coding agent at a repo is the wrong shape of work

Janitor

Non-content spans (acknowledgements, references, footnotes, headers, boilerplate) are dropped before the decomposition runs.

total spans: 45
kept: 40
dropped: 5
outliers: 2

content · 40
metadata · 2
noise · 1
boilerplate · 1
acknowledgements · 1