Talk to anyone who has been on a security team for more than five years and they'll tell you the same thing: data leaks happen in the boring direction more often than the dramatic one. The customer table doesn't get exfiltrated by an attacker who breaks in. It gets emailed to the wrong person. Logged into Slack. Fed to a third-party "helpful summarisation" tool. The interesting question for any security control is not can you detect the egress payload, it's at which boundaries do you actually look.

For AI agents, there are two boundaries per tool call that matter for PII.

Direction 1

What the agent sends OUT to the tool

The tool call params. Example: agent passes a customer's email and full name to a third-party translation API as part of "translate this support ticket." PII just left your network.

Direction 2

What the tool sends BACK to the agent

The tool's response payload. Example: agent calls a database tool that returns 500 customer rows including SSNs. The LLM now has those SSNs in its context window, and may include them in its answer to the user, log them, or pass them to the next tool.

Both boundaries leak. They leak through different mechanisms. They need different defences. A tool that watches one direction and not the other is structurally incomplete.

Why most tools only watch direction 1

Three architectural reasons explain why direction 2 (response) is harder, and why most products skip it.

  1. Where the product sits. An LLM-side moderation API or content classifier sees the prompt and the model output, not the tool roundtrip. By the time a tool returns, the data has already entered the model's context. The moderation product never sees it.
  2. Tool-call output structure. When an LLM calls a tool, the result that comes back is structured (JSON, tool-result blocks). Generic content moderators assume free-form text in/out and don't natively walk structured tool-output payloads. One large cloud gateway's docs are explicit about this: "This filter supports only text output and will not detect PII information when models respond with tool use (function call) output parameters via supported APIs." They documented the gap. Their architecture doesn't cover it.
  3. Cost and latency. Inspecting every tool response with a regex set, much less an ML classifier, adds milliseconds and bytes scanned. If the product's positioning is "low-latency moderation API", it has incentive to skip the response side.

The reasons are real. The result is still that direction 2 is mostly unguarded in production.

What we do

Clampd sits inline between the agent and the tool, so it sees both directions of every tool call. It runs detection on both.

Direction 1: tool call params

Every tool call goes through the rules engine before it leaves. Our PII rules in the engine include patterns for emails, SSNs, credit cards, phone numbers, region-specific identifiers (Aadhaar, PAN, NRIC, INSEE/NIR, UK NIN), and HIPAA-specific identifiers (medical record numbers, dates of birth in PHI context). Plus broader rules for "PII in plaintext", "PII in write operation", and "PII outbound exfiltration" that flag bulk data flowing to outbound categories like net:webhook:send or comms:email:send.

The rule that fires depends on the tool category. The same email-in-params will produce a low-risk flag if the tool is comms:email:send (where it's expected) and a high-risk block if the tool is net:webhook:send (where it isn't).

Direction 2: tool responses

Every response from a downstream tool is scanned by a separate response inspector before the agent sees it. The scanner uses a precompiled regex set that matches:

// services/crates/ag-gateway/src/response_inspector.rs
static PII_PATTERNS: LazyLock<RegexSet> = LazyLock::new(|| {
    RegexSet::new([
        // Email addresses
        r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}",
        // US Social Security Numbers
        r"\b\d{3}[-\s]?\d{2}[-\s]?\d{4}\b",
        // Credit card numbers (Visa, MC, Amex, Discover patterns)
        r"\b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|...)\b",
        // Credit card with separators
        r"\b\d{4}[\s-]\d{4}[\s-]\d{4}[\s-]\d{4}\b",
        // US phone numbers (various formats)
        r"\b(?:\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b",
        // HIPAA: Medical Record Number (with context)
        r"(?i)(?:MRN|MR|PAT(?:IENT)?)\s?[#:\s-]*...",
        // HIPAA: Date of Birth (with context keywords)
        r"(?i)(?:date.of.birth|dob|birth.?date|born)\s*...",
        // HIPAA: VIN (Vehicle Identification Number)
        r"\b[A-HJ-NPR-Z0-9]{17}\b",
        // GDPR: IBAN
        r"\b[A-Z]{2}\d{2}\s?[\dA-Z]{4}...",
        // GDPR: UK National Insurance Number
        r"(?i)\b[A-CEGHJ-PR-TW-Z]...",
        // GDPR: French INSEE/NIR (13+2 digits)
        r"\b[12]\s?\d{2}...",
    ])
    .expect("PII regex patterns must compile")
});

If the scanner finds a hit, the response is flagged with contains_pii_patterns=true, the tool result is annotated, and downstream rules can decide whether to block, redact, or just record. Critically, this happens before the response data gets back to the agent's LLM context.

The rules engine sees both

The PII rules in pii.toml are written to fire on either direction. flag-pii-in-write-operation fires when PII appears in the params of a write-class tool call. block-pii-in-llm-output fires when the rules engine processes a response with PII heading toward an LLM context.

Concretely, here's a partial list of PII rules in the engine (drawn directly from the TOML):

15 PII-specific rules, plus the response inspector's regex set, plus session-pattern sensitive_data_flow watching for PII-tagged data crossing categories.

Where we don't lead

Honest framing

One major cloud AI gateway documents 30+ specific PII entity types: ADDRESS, NAME, EMAIL, PHONE, USERNAME, PASSWORD, DRIVER_ID, LICENSE_PLATE, VIN, IBAN, SWIFT_CODE, IP_ADDRESS, MAC_ADDRESS, US/CA/UK-specific identifiers, and many more. We cover the common set (email, SSN, credit card, phone, MRN, DOB, VIN, IBAN, UK NIN, French INSEE/NIR) plus regional patterns relevant to our customer base. We are narrower in entity-type breadth.

Our wedge is not "more entity types." It's "in both directions." A product with 30+ PII types but no tool-output coverage misses entire scenarios. A product with 11 patterns but bidirectional coverage catches the cases where the PII originates from the tool, not the user.

The scenario this matters most for

Here's a real shape of attack that direction 1 alone doesn't catch.

An agent has read access to a customer database (legitimate) and the ability to send Slack messages (legitimate). The user asks "summarise the urgent support tickets from this week." The agent calls db.query, gets back 12 tickets, each with the customer's name, email, and ticket text. The agent then formats a Slack message and calls slack.send with the summary.

Direction 1 inspection: the db.query params are clean ("SELECT * FROM tickets WHERE created > ..."). The slack.send params contain the summary, which may or may not contain PII depending on what the LLM included.

Without direction 2: the 12 tickets came back through the gateway with PII in them. We didn't see it. The summary may strip the PII, or it may not. We're trusting the LLM to make a security judgment about what to include in the Slack message.

With direction 2: the response inspector flagged contains_pii_patterns=true on the db.query response. That flag enters the session context. The subsequent slack.send call now sees a session with sensitive-data-touched-recently, and the sensitive_data_flow session pattern fires before the message goes to Slack. Block, alert, manual review.

The key bit: the LLM was never the security control. The data path was the security control.

What you can do without Clampd

Three things to apply this week, regardless of vendor:

The security-team framing

If you're talking to a vendor about AI agent PII coverage, two questions cut through marketing language:

  1. "Does your detector run on tool-call response payloads, not just tool-call request parameters?" If they hesitate, the answer is "no."
  2. "What happens when a database tool returns 500 rows of customer data, and a subsequent tool sends a summary to Slack? Walk me through which detector fires when, with line numbers in your code if possible." This forces a real architectural answer.

If you can't get a clean answer to either, the product probably watches one boundary and not the other.

Try Clampd in 60 seconds

One line of Python or TypeScript. Tool-call params and tool-call responses both inspected. PII tagged in session context, session patterns fire on cross-tool sensitive-data flow. Self-hosted, source-available.

pip install clampd npm install @clampd/sdk
Get Started → Compare to alternatives