/blog

Notes from building an AI agent firewall.

Posts on tool-call security, MCP supply chain risk, scope tokens, agent payments, and what we learn shipping Clampd. No hype. No listicles. No "10 ways to secure your LLM."

SECURITY 2026-05-05  ·  8 min read

MCP Rug Pulls: when the tool you approved isn't the tool you call

An MCP server can advertise one tool schema during discovery, get approved by your team, then mutate that schema on a future deploy. Your agent doesn't notice. Your LLM doesn't notice either. We walk through the attack, why it matters, and how a 64-character SHA-256 fixes it.

Read post →
PERFORMANCE 2026-05-05  ·  9 min read

Real numbers: how fast is Clampd, actually?

44µs to evaluate 263 rules in-process. 5.14ms p50 / 7.92ms p95 end-to-end on a 2-vCPU production box, deny path. Hardware specs, sample sizes, what we couldn't measure, and the bench script you can run yourself. We also caught ourselves saying "sub-10ms typical" on the marketing site, which the data didn't support.

Read post →
PAYMENTS 2026-05-05  ·  10 min read

Agent payments are here. Here's what your security tool isn't doing about it.

Two payment protocols built for AI agents shipped specs in the last year: Google's AP2 and the open x402 HTTP 402 standard. Both let an agent move real money. We walk through the seven threat patterns nobody's wired up for (mandate replay, chain swap, payee swap), and what we built into the Clampd gateway to enforce policy at the protocol layer.

Read post →
DETECTION 2026-05-05  ·  9 min read

Session layers: 16 patterns we use to catch multi-step agent attacks.

Single-call inspection misses scrape-then-exfiltrate, slow-burn data pulls, sawtooth evasion, and cross-agent privilege escalation. Here are the 16 cross-call patterns we run on every tool call, what each fires on, and the worked example of a slow-drip exfiltration that survives per-call defence and dies at the session layer.

Read post →
COMPLIANCE 2026-05-05  ·  8 min read

OWASP LLM Top 10 to 263 runtime rules: a complete mapping.

Category-by-category mapping of OWASP LLM Top 10 (2023) to the 263 detection rules in our engine. Per-category counts, where we cover, where the gaps are, and the v2025 reshuffle. Includes a paste-able paragraph for vendor questionnaires.

Read post →
ARCHITECTURE 2026-05-05  ·  9 min read

LLM-as-Judge: when (and why) we let an LLM grade our security decisions.

Hybrid security: regex for the obvious, LLM judge for the gray zone (default 0.2–0.75). The four conditions under which the judge does NOT fire: disabled, no API key, cooldown after consecutive failures, per-minute rate limit. Why fail-open is the default, and what the judge actually sees.

Read post →
PII 2026-05-05  ·  8 min read

PII in tool calls vs PII in tool responses (and why most tools only catch one).

An AI agent tool call has two PII boundaries. Most products watch one. One major cloud gateway documents the gap explicitly: its PII filter does not detect PII in tool-use output parameters. We walk through the bidirectional model and the scenario where direction-1-only inspection silently fails.

Read post →
INCIDENT RESPONSE 2026-05-05  ·  9 min read

The kill switch: how Clampd stops a rogue agent across 8 layers in milliseconds.

Detection without response is just logging. The 8-layer cascade we run when an agent gets killed: deny list, NATS broadcast, token cache flush, session termination, IdP revoke, registry update, event broadcast, audit log. Each independent, fully idempotent. Latency budget per layer with the actual timings from cascade.rs.

Read post →
RED TEAM 2026-05-05  ·  9 min read

We red-teamed our own product: 556 payloads, 7 sources, what we learned.

The regression corpus we run against every Clampd build: 85 prompt-injection variants, 67 exfil patterns, 56 SQL injection, 55 RCE, 52 LFI, 47 encoding evasion, 45 XSS, 42 SSRF, plus 40 deliberately safe inputs to catch false positives. Sources: OWASP, SecLists, Garak, Promptfoo, PayloadsAllTheThings, our own.

Read post →
AUTH 2026-05-05  ·  9 min read

Scope tokens: replacing "the agent has the DB password" with per-call Ed25519-signed tokens.

Almost every AI agent in production today holds raw downstream credentials. We mint a short-lived (5 min) Ed25519-signed token per tool call, bound to the (tool, params) via SHA-256, verified by the tool through JWKS. A captured token can't be replayed against a different tool, different params, or after the TTL.

Read post →