Anyone can write "sub-10ms typical" on a landing page. Whether the page survives an honest reader running a stopwatch is a different question. This post is the stopwatch.

Two numbers we'll lead with, and we'll show our work for both.

44.3 µs Engine, full pipeline
5.14 ms Gateway p50, deny path
7.92 ms Gateway p95, deny path
263 Rules evaluated each call

The first is in-process rule evaluation. The second is end-to-end through the gateway, including auth, gRPC roundtrips between services, normalization, classification, policy decision, and audit. Different layers, different hardware, different scopes. Both are real measurements.

What we measured, and on what

Two test environments. We ran the engine bench on a developer laptop and the end-to-end gateway on a small production VPS. The gap between them matters when you read the rest of the post, so we want it stated up front.

EnvironmentCPURAMWhat it ran
Bench hostAMD Ryzen 5 5600H, 12 thread, ~3.3 GHz27 GBcargo bench — engine layer only
Deploy hostAMD EPYC-Rome, 2 vCPU @ 2.0 GHz3.7 GBFull gateway with NATS, Redis, Postgres, ClickHouse

The bench host is laptop-class. The deploy host is the smallest VPS that runs the full stack reasonably. Numbers from one don't generalise to the other. We deliberately chose hardware on both ends so readers can see the floor and the typical.

The engine: 44 µs for 263 rules

The detection engine is what runs against every tool call. It compiles 263 rules at startup, then evaluates each request through five layers: a fast funnel, regex rules, dictionary scan, signal compounding, and normalisation for encoded payloads.

The full pipeline benchmark drives a JSON tool-call payload through every layer and reports criterion mean.

BenchMeanWhat it does
stage_full_safe44.3 µsFull engine on a clean SQL query
stage_full_attack36.4 µsFull engine on a destructive SQL payload
stage_full_encoded28.6 µsFull engine on a base64-encoded attack

The attack path is faster than the safe path. That looks counter-intuitive until you remember rule evaluation can short-circuit the moment the score crosses the block threshold. Safe input sweeps every rule because nothing fires. Worth knowing if you ever care about adversarial latency.

The breakdown by layer shows where time goes inside a single 44µs evaluation.

LayerMeanNotes
stage_json_extract285 nsPull tool params out of the request body
stage_context_create174 nsBuild the per-call evaluation context
stage_funnel_clean2.57 µsAho-Corasick funnel on plaintext
stage_funnel_base643.99 µsSame funnel after base64 unwrap
stage_normalize5.06 µsMulti-step normalisation for encoded variants
stage_l1_rules_safe7.23 µsL1 regex layer, safe input, all 263 rules
stage_l1_rules_attack7.22 µsL1 regex layer, attack input
stage_l2_dictionary67 nsL2 keyword dictionary scan
stage_l3_signals1.71 µsL3 compound signal scoring

Two things worth pointing at. The full pipeline costs more than the sum of its parts because of context propagation between layers. And L2 dictionary at 67 nanoseconds is doing single-pass Aho-Corasick across thousands of keywords, which is one of the reasons we put it before the more expensive regex layer.

One number that's not per-request

The bench output also reports compile_110_builtin_rules at 286ms. That's the one-time cost of parsing 13 TOML files, compiling 263 regex patterns, and validating taxonomy on gateway startup. It's amortised across every request the gateway serves for the rest of its uptime. You pay it once when the process starts. Then 44 microseconds forever.

The gateway: 5ms p50 end-to-end on 2 vCPUs

The engine bench tells you the cost of the rules. The gateway bench tells you what shows up at the wire. Auth, agent identification, normalisation, the gRPC roundtrip to the intent service, policy decision, scope token mint, audit emit. All of it.

We drove a 540-call burst against the production gateway with the request rate-limited so the demo org could absorb it. 42 calls produced complete stage-latency log lines. Smaller sample size than we wanted; we'll explain why under "what we couldn't measure" below.

PercentileEnd-to-end
p505.14 ms
p906.80 ms
p957.92 ms
p9910.53 ms
mean5.33 ms
n42

Per-stage breakdown shows nothing dominating. Each gateway stage takes around a millisecond, which is roughly what we'd expect from gRPC roundtrips between services on the same box.

Stagep50p95What happens
auth0.55 ms1.38 msJWT verify, kill-list check
license_ratelimit0.81 ms1.85 msPer-org budget check in Redis
identify_delegate0.79 ms2.12 msgRPC to ag-registry, agent profile
normalize_extract0.75 ms2.75 msDecode and JSON-extract tool params
session_toolauth0.55 ms2.16 msPer-tool grant check, session budget
boundary_check0.38 ms0.70 msTrust boundary enforcement
classify_grpc1.00 ms1.76 msgRPC to ag-intent, the engine fires here
impersonation_match0.00 ms0.00 msSkipped for these calls

That single 1.00ms classify_grpc figure includes the 44µs engine evaluation we measured on the laptop, plus the gRPC roundtrip. Most of that millisecond is network and serialisation. The actual engine, on the much slower production CPU, is still well inside that envelope.

The honest caveat: when the LLM Judge fires, latency changes

Clampd has an optional gray-zone escalation path. When a request scores between two configurable thresholds (default 0.2 and 0.75) and the rules engine isn't confident, the gateway hands the request to an LLM-as-Judge for a semantic second opinion. When that fires, end-to-end latency changes character.

Honest framing

From a smaller earlier sample (n=6) where the LLM Judge fallback fired, end-to-end latency was 200 to 400 milliseconds. Almost all of that is the upstream LLM API roundtrip, not Clampd. Our pipeline accounts for ~5ms of it. But if you write "sub-10ms" on a landing page and your reader benchmarks a Suspicious-class call, they'll see 300ms and quote you. So we don't write "sub-10ms" anywhere any more.

The escalation is opt-in, and we've published the configurable thresholds so teams can tune them or disable the layer entirely. The default ships off in self-hosted unless you set MODEL_ESCALATION_ENABLED=true.

What we couldn't measure

Before anyone runs the same script and gets different numbers, here's what's missing from this post.

Reproducing this on your own machine

Engine bench, on any machine that can build the workspace.

# Clone, then from the services directory:
cargo bench -p ag-engine --bench evaluate
cargo bench -p ag-engine --bench stages
cargo bench -p ag-intent  --bench classify

# Each writes a criterion HTML report under target/criterion/

End-to-end gateway bench, against a local docker stack.

# From the clampd/ compose directory:
docker compose --profile full up -d
sleep 30   # let services pass health checks

# 500 requests, 10 concurrent, against the safe path:
hey -n 500 -c 10 -m POST \
  -H "Content-Type: application/json" \
  -H "X-AG-Key: clmpd_demo_key" \
  -d '{"agent_id":"b0000000-0000-0000-0000-000000000001",
       "tool":"db.query","params":{"sql":"SELECT 1"}}' \
  http://localhost:8080/v1/proxy

# Per-stage breakdown lives in the gateway log:
docker compose logs ag-gateway 2>&1 \
  | grep "Stage latency breakdown" \
  | tail -500

Numbers won't match ours exactly. Hardware, host noise, kernel version, and how warm the JIT-style ruleset is all matter. They should land in the same order of magnitude on a comparable machine.

What changed on the marketing site this week

Writing this post forced us to re-read our own copy. The phrase "sub-10ms typical latency" appeared in 13 places: the meta description, the hero stat, two FAQ answers, and the JSON-LD schema. With a deny-path p95 of 7.92ms and an unmeasured allow path, we couldn't honestly defend "typical" without more samples. So we changed it.

The site now leads with "44µs rule evaluation" (bench-backed and reproducible) and "single-digit ms end-to-end on commodity hardware" (deliberately wide enough to be true even after the 500-sample run lands). When we measure allow-path properly, we'll narrow the language. Until then, the homepage and this post agree.

Try Clampd in 60 seconds

One line of Python or TypeScript. Works with OpenAI, Anthropic, LangChain, CrewAI, Google ADK, and any MCP server. Self-hosted, source-available, no telemetry by default.

pip install clampd npm install @clampd/sdk
Get Started → Why Clampd