/blog

Notes from building an AI agent firewall.

Posts on tool-call security, MCP supply chain risk, scope tokens, agent payments, and what we learn shipping Clampd. No hype. No listicles. No "10 ways to secure your LLM."

ADVISORY 2026-06-03  ·  9 min read

Comment and Control: a GitHub comment hijacks Claude Code in CI

A researcher turned a GitHub PR title, issue body, and comment into a prompt injection that hijacked Claude Code, Gemini CLI, and GitHub Copilot in GitHub Actions, then made them dump the workflow's secrets. Anthropic rated its variant CVSS 9.4 Critical. It can't be patched inside the agent, because reading the comment is the agent's job. We walk the attack chain, show where a tool-call firewall catches the exfiltration on its way out, and give the three-line clampd-action setup, with an honest note on what it does and doesn't cover.

Read post →
ADVISORY 2026-05-29  ·  7 min read

The GitHub MCP toxic flow: one issue, your private repos in a public PR

In May 2025, Invariant Labs showed a single malicious GitHub issue could make an AI agent leak private repository contents into a public pull request. It's not a GitHub bug but an architectural "toxic flow" of individually-authorized tool calls. We break down the attack, why prompt-injection filters miss it, and the multi-step session pattern that catches the read-then-exfiltrate sequence.

Read post →
DEVELOPER GUIDE 2026-05-13  ·  11 min read

Wrap your agents in Clampd in 10 minutes.

A drop-in onboarding guide for engineers running LangChain, CrewAI, OpenAI tool-use, Anthropic, or Google ADK in production. One decorator. No agent loop rewrite. Includes a live risk feed snapshot (descriptor_hash_mismatch, task_replay 0.90, non-ASCII agent IDs blocked at registry), the dashboard workflow lock/unlock/approve cycle on a real cluster, cryptographic signed-delegation enforcement with 4 live test scenarios (chain_hash_mismatch, jwt_invalid, missing-proof), and an 8-second end-to-end kill cascade walking a 3-agent tree.

Read post →
BENCHMARK 2026-05-12  ·  12 min read

Real numbers: Clampd on InjecAgent (72.87% to 79.13% TPR, 0% added FPR)

1054 attack cases + 91 benign-API cases. Five runs. Baseline 72.87%. We almost shipped 85.39% but it overfit the benchmark, then 81.21% but it FPed at 11% on legitimate business content. Tier-split weights landed at 79.13% TPR with 0% measured false-positive rate from our new rules. The whole journey, the deployed code, and the runners you can execute yourself.

Read post →
SECURITY 2026-05-05  ·  8 min read

MCP security: rug pulls and the SHA-256 descriptor fix

The biggest MCP security gap nobody is talking about. An MCP server can advertise one tool schema during discovery, get approved, then mutate that schema on a future deploy. Your agent doesn't notice. Your LLM doesn't notice either. We walk through the attack and how a 64-character SHA-256 descriptor hash fixes it.

Read post →
PERFORMANCE 2026-05-05  ·  9 min read

Real numbers: how fast is Clampd, actually?

44µs to evaluate 263 rules in-process. 5.14ms p50 / 7.92ms p95 end-to-end on a 2-vCPU production box, deny path. Hardware specs, sample sizes, what we couldn't measure, and the bench script you can run yourself. We also caught ourselves saying "sub-10ms typical" on the marketing site, which the data didn't support.

Read post →
PAYMENTS 2026-05-05  ·  10 min read

Agent payments are here. Here's what your security tool isn't doing about it.

Two payment protocols built for AI agents shipped specs in the last year: Google's AP2 and the open x402 HTTP 402 standard. Both let an agent move real money. We walk through the seven threat patterns nobody's wired up for (mandate replay, chain swap, payee swap), and what we built into the Clampd gateway to enforce policy at the protocol layer.

Read post →
DETECTION 2026-05-05  ·  9 min read

Session layers: 16 patterns we use to catch multi-step agent attacks.

Single-call inspection misses scrape-then-exfiltrate, slow-burn data pulls, sawtooth evasion, and cross-agent privilege escalation. Here are the 16 cross-call patterns we run on every tool call, what each fires on, and the worked example of a slow-drip exfiltration that survives per-call defence and dies at the session layer.

Read post →
COMPLIANCE 2026-05-05  ·  8 min read

OWASP LLM Top 10, runtime rules mapping (2023 & 2025).

OWASP LLM Top 10 mapped category-by-category to the 263 runtime detection rules in our engine. Per-category coverage, the gaps, the 2025 reshuffle, and a paste-able paragraph for vendor questionnaires.

Read post →
ARCHITECTURE 2026-05-05  ·  9 min read

LLM-as-Judge: when (and why) we let an LLM grade our security decisions.

Hybrid security: regex for the obvious, LLM judge for the gray zone (default 0.2–0.75). The four conditions under which the judge does NOT fire: disabled, no API key, cooldown after consecutive failures, per-minute rate limit. Why fail-open is the default, and what the judge actually sees.

Read post →
PII 2026-05-05  ·  8 min read

PII in tool calls vs PII in tool responses (and why most tools only catch one).

An AI agent tool call has two PII boundaries. Most products watch one. One major cloud gateway documents the gap explicitly: its PII filter does not detect PII in tool-use output parameters. We walk through the bidirectional model and the scenario where direction-1-only inspection silently fails.

Read post →
INCIDENT RESPONSE 2026-05-05  ·  9 min read

The kill switch: how Clampd stops a rogue agent across 8 layers in milliseconds.

Detection without response is just logging. The 8-layer cascade we run when an agent gets killed: deny list, NATS broadcast, token cache flush, session termination, IdP revoke, registry update, event broadcast, audit log. Each independent, fully idempotent. Latency budget per layer with the actual timings from cascade.rs.

Read post →
RED TEAM 2026-05-05  ·  9 min read

LLM red team payloads: our 556-payload corpus, 7 sources.

An open LLM red team payloads corpus we run against every Clampd build: 85 prompt-injection variants, 67 exfil patterns, 56 SQL injection, 55 RCE, 52 LFI, 47 encoding evasion, 45 XSS, 42 SSRF, plus 40 deliberately safe inputs to catch false positives. Sources: OWASP, SecLists, Garak, Promptfoo, PayloadsAllTheThings, our own.

Read post →
AUTH 2026-05-05  ·  9 min read

Scope tokens: replacing "the agent has the DB password" with per-call Ed25519-signed tokens.

Almost every AI agent in production today holds raw downstream credentials. We mint a short-lived (5 min) Ed25519-signed token per tool call, bound to the (tool, params) via SHA-256, verified by the tool through JWKS. A captured token can't be replayed against a different tool, different params, or after the TTL.

Read post →