What happened

In a coordinated disclosure dubbed "Comment and Control," security researcher Aonan Guan, with Johns Hopkins researchers Zhengyu Liu and Gavin Zhong, showed the same attack pattern against three of the most widely deployed AI coding agents in CI: Anthropic's Claude Code Security Review, Google's Gemini CLI Action, and GitHub's Copilot Agent. All three were confirmed and fixed by their vendors. Anthropic rated the Claude Code variant CVSS 9.4 Critical.

The setup is the part that should worry you: it needs nothing privileged. Anyone who can comment on a public repo, open an issue, or file a PR can deliver the payload. The agent does the rest, because reading that comment is its job.

The three findings, in increasing sophistication:

Why it can't be patched inside the agent

The researcher's own conclusion is the one that matters: "The prompt injection here is not a bug; it is context that the agent is designed to process." The agent's whole purpose is to read the PR title, the issue, the comment, and act. You cannot tell it to stop reading untrusted input without removing the feature.

Three things line up, and they are structural:

Why prompt-injection filters miss it

A "Trusted Content Section" or a polite request to "include the environment in your report" reads as ordinary prose. Hidden HTML comments are invisible to a human reviewer entirely. Text scanning helps, but a payload worded cleverly enough slips past it. The behaviour you can't hide is what the agent does next: dump the environment, base64 it, push it out.

Where Clampd sits: the tool call, not the prompt

This is exactly the surface clampd-action exists for. You cannot modify Claude Code, and you can't stop it from reading the comment. So you put a firewall under it: every tool call the agent makes inside the workflow, Bash, Read, Write, WebFetch, is routed through the Clampd gateway and checked against 285 detection rules plus Cedar policy before it executes. The injection can succeed at convincing the model; the tool calls it triggers still have to pass the firewall, and the ones that touch secrets or an outbound channel are where it bites.

# .github/workflows/claude-code.yml
permissions:
  id-token: write          # stable agent identity via OIDC
  contents: read

steps:
  # 1. Arm the firewall. Must run BEFORE the agent step.
  - uses: clampd/clampd-action@v1
    with:
      dsn: ${{ secrets.CLAMPD_DSN }}
      agent-runtime: claude-code
      # fail-open defaults to false in CI: unreachable gateway = fail closed

  # 2. Run the agent as usual. Its tool calls are now verified.
  - uses: anthropics/claude-code-action@v1
    with:
      prompt: "fix issue #${{ github.event.issue.number }}"

Three lines of YAML, one secret. No gateway to deploy. The same call works in the Cursor IDE, on a laptop, or in any pipeline via clampd-guard directly.

What gets checked, step by step

The Comment and Control chain has four distinct moves. Clampd evaluates each tool call against its detection layers before it runs, and the categories below line up with the chain. None of this needs to know the prompt was poisoned, it keys on the action.

1. The injection text itself

When the poisoned comment is scanned as model input, the prompt-injection layer flags the classic override, roleplay, and delimiter patterns, plus explicit "forward the environment" style phrasing. This is the weakest of the four, exactly as above: a payload worded as ordinary prose, or hidden in an HTML comment, can read clean. Treat it as a tripwire, not the wall.

2. Recon and the environment dump

This is where the firewall earns its place. Reads of process and system state under /proc, of .env files, and of credential and config files are detected as sensitive-source access, and chained recon commands (a whoami piped into an env, and the like) are flagged as reconnaissance. This is the step Anthropic tried to patch by blocking the ps command specifically, and the reason a single-command blocklist doesn't hold: the same secrets are reachable through /proc, which Clampd treats as a sensitive read regardless of the binary used to get there.

3. The base64 encode-to-evade

Base64-wrapping a secret to slip past GitHub's prefix-based secret scanner is itself a signal. Clampd's normalization layer decodes encoded payloads before matching, and encode/decode chains tied to environment or credential data are scored as obfuscation rather than waved through. The trick that beats the native scanner is the kind of thing that raises the score here.

4. The exfiltration channel

The same data leaving the runner is checked whichever door it uses:

Honest scope

This is detection by behaviour, not a magic box. The strongest coverage is on the sensitive-read and the network-exfil ends, where the attacker has to touch /proc, a credential file, or an outbound channel to win. A determined attacker will keep finding command variants, so the right posture is layered: tune what counts as a sensitive source for your environment, keep tokens least-privilege, and use tool allowlisting alongside. Clampd is one layer of that, not a substitute for the rest.

# clampd-guard hook, the moment the hijacked agent reaches for the secrets
Bash("cat /proc/1/environ")        # the alias a ps-blocklist misses
  ✗ BLOCKED   sensitive-source read (/proc)   exit 2, tool never runs

Read(".env")                       # credential file
  ✗ BLOCKED   sensitive-file access
  risk_score: high   action: block   audit: logged to app.clampd.dev
What this assumes, and what it doesn't

Clampd does not try to win the prompt-injection arms race, that's the fight the researcher showed is unwinnable inside the agent. It assumes the injection may succeed and aims at the consequence: the secret leaving the runner. In CI the guard defaults to fail-closed, so an unreachable gateway blocks rather than waves calls through. It is not the only control that helps here, and it shouldn't be the only one you run. Pair it with the disclosure's own advice, least-privilege tokens and tool allowlisting, and with network egress filtering, and you have real defense in depth: each layer shrinks what the others have to catch.

What you can do today, with or without Clampd

The pattern is bigger than GitHub Actions. As the disclosure notes, it applies to any agent processing untrusted input with tools and secrets in reach, Slack bots, Jira agents, email triagers, deploy pipelines. The fix is the same everywhere: stop assuming you can keep the injection out, and start checking what the agent does with it.