Anyone can write "sub-10ms typical" on a landing page. Whether the page survives an honest reader running a stopwatch is a different question. This post is the stopwatch.

Two numbers we'll lead with, and we'll show our work for both.

44.3 µs Engine, full pipeline
5.14 ms Gateway p50, deny path
7.92 ms Gateway p95, deny path
263 Rules evaluated each call

The first is in-process rule evaluation. The second is end-to-end through the gateway, including auth, gRPC roundtrips between services, normalization, classification, policy decision, and audit. Different layers, different hardware, different scopes. Both are real measurements.

What we measured, and on what

Two test environments. We ran the engine bench on a developer laptop and the end-to-end gateway on a small production VPS. The gap between them matters when you read the rest of the post, so we want it stated up front.

EnvironmentCPURAMWhat it ran
Bench hostAMD Ryzen 5 5600H, 12 thread, ~3.3 GHz27 GBcargo bench, engine layer only
Deploy hostAMD EPYC-Rome, 2 vCPU @ 2.0 GHz3.7 GBFull gateway with NATS, Redis, Postgres, ClickHouse

The bench host is laptop-class. The deploy host is the smallest VPS that runs the full stack reasonably. Numbers from one don't generalise to the other. We deliberately chose hardware on both ends so readers can see the floor and the typical.

The engine: 44 µs for 263 rules

The detection engine is what runs against every tool call. It compiles 263 rules at startup, then evaluates each request through five layers: a fast funnel, regex rules, dictionary scan, signal compounding, and normalisation for encoded payloads.

The full pipeline benchmark drives a JSON tool-call payload through every layer and reports criterion mean.

BenchMeanWhat it does
stage_full_safe44.3 µsFull engine on a clean SQL query
stage_full_attack36.4 µsFull engine on a destructive SQL payload
stage_full_encoded28.6 µsFull engine on a base64-encoded attack

The attack path is faster than the safe path. That looks counter-intuitive until you remember rule evaluation can short-circuit the moment the score crosses the block threshold. Safe input sweeps every rule because nothing fires. Worth knowing if you ever care about adversarial latency.

The breakdown by layer shows where time goes inside a single 44µs evaluation.

LayerMeanNotes
stage_json_extract285 nsPull tool params out of the request body
stage_context_create174 nsBuild the per-call evaluation context
stage_funnel_clean2.57 µsAho-Corasick funnel on plaintext
stage_funnel_base643.99 µsSame funnel after base64 unwrap
stage_normalize5.06 µsMulti-step normalisation for encoded variants
stage_l1_rules_safe7.23 µsL1 regex layer, safe input, all 263 rules
stage_l1_rules_attack7.22 µsL1 regex layer, attack input
stage_l2_dictionary67 nsL2 keyword dictionary scan
stage_l3_signals1.71 µsL3 compound signal scoring

Two things worth pointing at. The full pipeline costs more than the sum of its parts because of context propagation between layers. And L2 dictionary at 67 nanoseconds is doing single-pass Aho-Corasick across thousands of keywords, which is one of the reasons we put it before the more expensive regex layer.

One number that's not per-request

The bench output also reports compile_110_builtin_rules at 286ms. That's the one-time cost of parsing 13 TOML files, compiling 263 regex patterns, and validating taxonomy on gateway startup. It's amortised across every request the gateway serves for the rest of its uptime. You pay it once when the process starts. Then 44 microseconds forever.

The gateway: 5ms p50 end-to-end on 2 vCPUs

The engine bench tells you the cost of the rules. The gateway bench tells you what shows up at the wire. Auth, agent identification, normalisation, the gRPC roundtrip to the intent service, policy decision, scope token mint, audit emit. All of it.

We drove a 540-call burst against the production gateway with the request rate-limited so the demo org could absorb it. 42 calls produced complete stage-latency log lines. Smaller sample size than we wanted; we'll explain why under "what we couldn't measure" below.

PercentileEnd-to-end
p505.14 ms
p906.80 ms
p957.92 ms
p9910.53 ms
mean5.33 ms
n42

Per-stage breakdown shows nothing dominating. Each gateway stage takes around a millisecond, which is roughly what we'd expect from gRPC roundtrips between services on the same box.

Stagep50p95What happens
auth0.55 ms1.38 msJWT verify, kill-list check
license_ratelimit0.81 ms1.85 msPer-org budget check in Redis
identify_delegate0.79 ms2.12 msgRPC to ag-registry, agent profile
normalize_extract0.75 ms2.75 msDecode and JSON-extract tool params
session_toolauth0.55 ms2.16 msPer-tool grant check, session budget
boundary_check0.38 ms0.70 msTrust boundary enforcement
classify_grpc1.00 ms1.76 msgRPC to ag-intent, the engine fires here
impersonation_match0.00 ms0.00 msSkipped for these calls

That single 1.00ms classify_grpc figure includes the 44µs engine evaluation we measured on the laptop, plus the gRPC roundtrip. Most of that millisecond is network and serialisation. The actual engine, on the much slower production CPU, is still well inside that envelope.

v1.0 refresh: same box, real SDK in the path

Everything above measured the gateway with hey, hitting /v1/proxy directly with crafted JSON. After v1.0 shipped, with Ed25519 enrollment replacing the old shared-secret JWT, we re-ran the latency test through the actual TypeScript SDK to see what an honest end-to-end integration feels like.

The harness lives on the deploy host itself, hitting ag-gateway:8080 over the docker network. 60 calls across clampd.guard(), clampd.scanInput(), and clampd.scanOutput(). Wall clock measured client-side with performance.now(). Same 2 vCPU AMD EPYC-Rome box as the gateway bench above.

3 ms SDK p50, overall
11 ms SDK p95, overall
12 ms SDK p99, overall
60 Calls, mixed traffic
SDK callp50p95What it hits
clampd.guard()5 ms7 msPOST /v1/proxy, full classify and scope check
clampd.scanInput()3 ms6 msPOST /v1/scan-input, prompt scan only
clampd.scanOutput()3 ms7 msPOST /v1/scan-output, PII and secrets scan

The story is roughly what the gateway bench said. Per-call cost is JWT verify, agent profile fetch, JSON marshal, and the rule engine itself. v1.0 added per-request EdDSA signing, which costs about a hundred microseconds and is comfortably in the noise.

The same probe from a laptop in Tallinn, through an SSH tunnel to an EU-West VPS, comes in at p50 of 159ms. That's roughly 50ms of WAN round-trip on top of the 5ms we'd see locally. Same SDK code, same gateway, just network distance. If your worker runs in the same region as your Clampd cluster, the local number is what you'll feel. If it runs an ocean away, add the latency to whatever your CDN tells you.

What this measurement caught that the May 5 numbers didn't

The hey benchmark didn't sign JWTs, didn't manage per-agent identity, didn't deal with the SDK's loop detection or retry logic. The 3ms p50 here includes all of that. If anything, the v1.0 stack is a hair faster than v0.20 was, mostly thanks to an in-process agent-profile cache that landed in late May.

The honest caveat: when the LLM Judge fires, latency changes

Clampd has an optional gray-zone escalation path. When a request scores between two configurable thresholds (default 0.2 and 0.75) and the rules engine isn't confident, the gateway hands the request to an LLM-as-Judge for a semantic second opinion. When that fires, end-to-end latency changes character.

Honest framing

From a smaller earlier sample (n=6) where the LLM Judge fallback fired, end-to-end latency was 200 to 400 milliseconds. Almost all of that is the upstream LLM API roundtrip, not Clampd. Our pipeline accounts for ~5ms of it. But if you write "sub-10ms" on a landing page and your reader benchmarks a Suspicious-class call, they'll see 300ms and quote you. So we don't write "sub-10ms" anywhere any more.

The escalation is opt-in, and we've published the configurable thresholds so teams can tune them or disable the layer entirely. The default ships off in self-hosted unless you set MODEL_ESCALATION_ENABLED=true.

What we couldn't measure

Before anyone runs the same script and gets different numbers, here's what's missing from this post.

Reproducing this on your own machine

Engine bench, on any machine that can build the workspace.

# Clone, then from the services directory:
cargo bench -p ag-engine --bench evaluate
cargo bench -p ag-engine --bench stages
cargo bench -p ag-intent  --bench classify

# Each writes a criterion HTML report under target/criterion/

End-to-end gateway bench, against a local docker stack.

# From the clampd/ compose directory:
docker compose --profile full up -d
sleep 30   # let services pass health checks

# 500 requests, 10 concurrent, against the safe path:
hey -n 500 -c 10 -m POST \
  -H "Content-Type: application/json" \
  -H "X-AG-Key: clmpd_demo_key" \
  -d '{"agent_id":"b0000000-0000-0000-0000-000000000001",
       "tool":"db.query","params":{"sql":"SELECT 1"}}' \
  http://localhost:8080/v1/proxy

# Per-stage breakdown lives in the gateway log:
docker compose logs ag-gateway 2>&1 \
  | grep "Stage latency breakdown" \
  | tail -500

Numbers won't match ours exactly. Hardware, host noise, kernel version, and how warm the JIT-style ruleset is all matter. They should land in the same order of magnitude on a comparable machine.

What changed on the marketing site this week

Writing this post forced us to re-read our own copy. The phrase "sub-10ms typical latency" appeared in 13 places: the meta description, the hero stat, two FAQ answers, and the JSON-LD schema. With a deny-path p95 of 7.92ms and an unmeasured allow path, we couldn't honestly defend "typical" without more samples. So we changed it.

The site now leads with "44µs rule evaluation" (bench-backed and reproducible) and "single-digit ms end-to-end on commodity hardware" (deliberately wide enough to be true even after the 500-sample run lands). When we measure allow-path properly, we'll narrow the language. Until then, the homepage and this post agree.

Try Clampd in 60 seconds

One line of Python or TypeScript. Works with OpenAI, Anthropic, LangChain, CrewAI, Google ADK, and any MCP server. Self-hosted, open-core, no telemetry by default.

pip install clampd npm install @clampd/sdk
Get Started → Why Clampd