What happened

On 26 May 2025, Invariant Labs disclosed an attack against the official GitHub MCP server โ€” a project with more than 20,000 stars, wired into agent setups across most major AI platforms. The attack lets anyone with the ability to open an issue in a public repo pull data out of your private ones.

The demonstration was concrete. The researchers planted a malicious issue in a public repo (ukend0464/pacman). A user then asked their agent, in good faith, "have a look at the open issues in pacman." The agent read the issue, followed the instructions buried in it, pulled data from the user's private repositories, and opened a pull request in the public pacman repo containing it. Leaked material in the demo included a private project ("Jupiter Star"), the user's relocation plans, and their salary โ€” now readable by anyone. It reproduced even against a current, well-aligned model. GitHub tracked it as issue #844.

Why it worked

Three things line up. First, the malicious instructions don't come from the user โ€” they come from data the agent fetched. The issue body is a tool response, and the agent treats fetched content as trustworthy context. That's indirect prompt injection.

# the agent's own token spans both repos, so this is all "authorized"
github.list_issues(repo="ukend0464/pacman")        # reads the poisoned issue
github.get_file_contents(repo="<user>/jupiter-star")   # private repo, pulled into context
github.create_pull_request(repo="ukend0464/pacman", # public PR with the loot
                           body="<private repo summary>")

Second, the agent uses one GitHub token for everything. The same credential reads the public repo and the private one, so every individual call is, technically, authorized. Third โ€” and this is the part Invariant stresses โ€” it's not a bug in the GitHub MCP server. The code does exactly what it should. It's an architectural property they call a toxic agent flow: indirect injection that triggers a malicious tool-use sequence. Public-repo read โ†’ private-repo read โ†’ public-repo write.

Why it's easy to miss

No single step looks malicious. Listing issues, reading a repo the token already has access to, opening a PR โ€” all routine on their own. The attack lives in the order, not in any one call. Invariant note that even state-of-the-art aligned models and most off-the-shelf prompt-injection detectors miss it, because the danger is contextual โ€” it depends on what the agent already touched this session.

What Clampd catches at the tool call

Clampd looks at this in two layers, and it's worth being precise about which one matters.

Layer one: response scanning. Clampd scans tool responses, not just prompts, so the imperative instructions hidden in the issue body are in scope. Rules R013โ€“R015 (instruction-override, roleplay, and delimiter-style injection) fire on classic payloads in fetched content. This is necessary โ€” but, exactly as Invariant found, scanning a single blob of text is not sufficient on its own. A well-worded payload can read as ordinary prose.

Layer two โ€” the one that actually fits this attack: multi-step session detection. Clampd's gateway tracks a session's tool calls across request boundaries and scores the shape of the sequence, not just each call in isolation. A "read sensitive source โ†’ write to an external sink" flow is one of its built-in multi-step patterns. The PR-creation step is where the flow gets denied โ€” because by then the session has already pulled from a private source and is now trying to push outward.

Layers ยท R013โ€“R015 (injection) + session multi-step pattern

The injection rules check the fetched issue body; the session analyzer watches the sequence of calls that follows. The second is what survives a cleverly-worded payload, because it keys on behaviour the attacker can't hide: data left a private source and is heading to a public one in the same session.

# clampd proxy โ€” the outbound PR, in context of the session so far
github.create_pull_request(repo="ukend0464/pacman", body="<private repo summary>")
  โœ— BLOCKED โ€” session pattern: sensitive-read → external-write
  matched: R013 (injection markers in fetched issue body)
  session: private-source read + public-sink write in one flow
  risk_score: high   action: block   audit: logged
What this does and doesn't assume

Clampd catches this flow by its shape โ€” a sensitive read followed by an external write in the same session โ€” which is exactly why it survives a payload worded cleverly enough to slip past text scanning. It works the same way for any tool, not just GitHub. The one thing it does not do is read GitHub's permission model for you, so you tune what counts as a "sensitive source" for your environment, and you pair it with a least-privilege, fine-grained token. Two controls instead of one: the token shrinks the blast radius, Clampd catches the flow that's left. Defense in depth โ€” which, for a class of attack with no clean single-call fix, is the point.

What you can do about it today

Useful even if you never run Clampd:

The category is young and there's no clean single-call fix โ€” Invariant said as much. Treat agent tool calls like what they are: a sequence with side effects, where the dangerous part is often the combination.