Pillar 2: Guardrails

What keeps them safe.

Claude Code agents are powerful. Without boundaries, they can push to production branches, bump versions without a release workflow, leak credentials into tool calls, loop infinitely on broken commands, or quietly bloat their own instruction files. Guardrails close all of these gaps — automatically, at the hook layer, before any damage reaches your codebase.

Your fleet operates within boundaries you set.

8 guardrails ship active by default. No configuration required. They fire silently when everything is fine and block loudly when something would go wrong.

Table of contents

  1. The guardrail pipeline
  2. Guardrail 1: Secrets scanning
    1. What it detects
    2. Suppression
    3. Configuration
  3. Guardrail 2: Retry circuit breaker
    1. How operations are fingerprinted
    2. Configuration
  4. Guardrail 3: Branch guard
    1. Blocked operations
    2. What is allowed
  5. Guardrail 4: Version guard
    1. Protected files
    2. Detected patterns
  6. Guardrail 5: CLAUDE.md sanity
    1. Configuration
  7. Guardrail 6: MCP surface area
    1. Configuration
  8. Guardrail 7: Output token limit
    1. Configuration
  9. Guardrail 8: File read deduplication
    1. Configuration
  10. All guardrails at a glance
  11. Exit code semantics
  12. Hardening your setup

The guardrail pipeline

Every agent action passes through a layered defense before executing. The pipeline runs at three Claude Code hook events: UserPromptSubmit, PreToolUse, and PostToolUse.

flowchart TD
    P([User Prompt]) --> UPS

    subgraph UserPromptSubmit
        UPS[Secrets Scanner\nprompt scan]
    end

    UPS --> TOOL([Tool about to execute])

    subgraph PreToolUse["PreToolUse — security gate"]
        S[Secrets Scanner\ntool input scan]
        B[Branch Guard\ngit command filter]
        V[Version Guard\nmanifest protection]
        C[CLAUDE.md Sanity\nline limit check]
        RC[Retry Circuit Breaker\nhard block]
        FR[File Read Dedup\nredundant read block]
    end

    TOOL --> S --> B --> V --> C --> RC --> FR

    S -->|secret found| BLOCK1([BLOCKED — exit 2])
    B -->|merge/force-push/tag| BLOCK2([BLOCKED — exit 2])
    V -->|version field edit| BLOCK3([BLOCKED — exit 2])
    C -->|file exceeds line cap| BLOCK4([BLOCKED — exit 2])
    RC -->|hard max hit| BLOCK5([BLOCKED — exit 2])
    FR -->|unchanged file| BLOCK6([BLOCKED — exit 2])

    FR -->|all clear| EXEC([Tool executes])

    subgraph PostToolUse["PostToolUse — learning + enforcement"]
        RCT[Retry Circuit Breaker\nfailure tracking]
        BA[Bash Output Filter\nverbose truncation]
    end

    EXEC --> RCT
    EXEC --> BA
    RCT -->|threshold hit| WARN([Inject research instructions])
    BA -->|verbose output| TRUNC([Truncated + re-emitted])

Guardrail 1: Secrets scanning

Hook: UserPromptSubmit (warn), PreToolUse (block) Default: On — AGENTIHOOKS_SECRETS_MODE=standard

The secrets scanner intercepts credentials before they can enter tool calls, log files, or git history. It runs twice per turn: once on the raw user prompt (warn only) and once on every tool’s input parameters (block on detection).

What it detects

Pattern name What it catches
aws_access_key AKIA/ASIA/AROA/AIPA prefixed AWS key IDs
aws_secret_key aws_secret_access_key = <value> assignments
github_token ghp_, ghs_, github_pat_ tokens
private_key PEM-encoded RSA, EC, OPENSSH, PGP private keys
bearer_token Authorization: Bearer <token> headers
db_url_creds postgres://user:pass@host, mysql://, mongodb:// URLs
generic_secret PASSWORD=, API_KEY=, SECRET= assignments with 8+ char values

In strict mode, three additional patterns activate:

Pattern name What it catches
slack_token xox[bpors]-... Slack OAuth tokens
stripe_key sk_live_, sk_test_, rk_live_, rk_test_ keys
jwt_token Three-part base64url JWTs (eyJ...eyJ...)

Suppression

Lines with # nosecret are excluded from scanning. Use this for documentation, test fixtures, or known-safe patterns:

example_key = "AKIAIOSFODNN7EXAMPLE"  # nosecret

Configuration

Variable Default Options
AGENTIHOOKS_SECRETS_MODE standard off, warn, standard, strict
  • off — scanning disabled entirely
  • warn — detects and warns, never blocks
  • standard — detects standard 7 patterns, blocks on PreToolUse
  • strict — standard + Slack/Stripe/JWT, blocks on PreToolUse

Guardrail 2: Retry circuit breaker

Hook: PostToolUse (track + warn), PreToolUse (hard block) Default: On — RETRY_BREAKER_ENABLED=true, soft at 5, hard at 10

Claude Code agents will retry failing operations. Without a circuit breaker, a stuck agent can hammer the same broken command dozens of times — burning context, wasting tokens, and never escaping the loop.

The retry circuit breaker tracks consecutive failures per operation. When the same operation fails repeatedly, it escalates through two stages:

Stage 1 — Soft warning (default: 5 failures): Injects a banner into Claude’s context telling it to stop retrying and instead launch parallel error-researcher agents for web search. The agent still can proceed — but it now knows it should research first.

Stage 2 — Hard block (default: 10 failures): Raises a BlockAction via PreToolUse, preventing the tool from executing at all. The agent must research and change approach before the block lifts.

How operations are fingerprinted

The breaker tracks per-operation, not per-tool. Operations are fingerprinted by their base command, with subcommand-level grouping for DevOps tools:

Tool call Operation key
Bash: kubectl apply -f deploy.yaml bash:kubectl:apply
Bash: kubectl get pods bash:kubectl:get
Bash: terraform plan bash:terraform:plan
Bash: terraform apply bash:terraform:apply
Bash: docker build . bash:docker:build
Edit edit

This means kubectl apply and kubectl get have independent counters — a stuck apply doesn’t block unrelated reads.

Error text is also normalized (hex, timestamps, paths, numbers stripped) before fingerprinting, so slightly different error messages from the same root cause are treated as the same failure.

Configuration

Variable Default Description
RETRY_BREAKER_ENABLED true Master switch
RETRY_BREAKER_MAX 5 Consecutive failures before soft warning
RETRY_BREAKER_HARD_MAX 10 Consecutive failures before hard block
RETRY_BREAKER_TTL 3600 Redis key TTL in seconds (1 hour)

State is persisted in Redis per session and falls back to in-memory when Redis is unavailable.


Guardrail 3: Branch guard

Hook: PreToolUse (block, Bash commands only) Default: On — always active

Prevents destructive git operations that bypass the normal PR and release workflow. Fires on git commands before they execute.

Blocked operations

Command pattern Reason
git merge ... main / git merge ... master Direct merges bypass PR review
git reset ... main / git reset ... master Rewrites protected branch history
git push --force / git push -f Can destroy remote history
git push --force-with-lease Force push variant — same risk
git tag Tagging is a release operation; must go through CI

All blocked operations return exit code 2 with a human-readable explanation and, where applicable, the recommended alternative (e.g., gh workflow run release.yml for tagging).

What is allowed

Normal git push (without force flags) is allowed — branch protection is a remote concern handled by GitHub. Read-only operations (git checkout, git switch, git pull, git log, git status, git diff) are never touched.

The guard strips heredoc bodies and quoted commit messages before pattern matching to avoid false positives on message content that happens to mention “main” or “master”.


Guardrail 4: Version guard

Hook: PreToolUse (block, Edit and Write tools) Default: On — always active

Version fields in project manifests should be managed by the CI release workflow, not by an AI agent editing files directly. The version guard blocks any Edit or Write operation that would modify a version field in a known manifest file.

Protected files

pyproject.toml, package.json, Cargo.toml, setup.cfg, setup.py, version.txt, VERSION

Detected patterns

version = "1.2.3"        # pyproject.toml, Cargo.toml, setup.cfg
"version": "1.2.3"       # package.json

If a Write or Edit targets one of these files and the new content contains a version field pattern, the operation is blocked with a message directing to the release workflow:

BLOCKED: Version field modification in pyproject.toml is not allowed.
Version bumping is handled by the release workflow
(gh workflow run release.yml -f bump=patch|minor|major).
Do not edit version fields manually.

Guardrail 5: CLAUDE.md sanity

Hook: PreToolUse (block, Edit and Write tools) Default: On — AGENTIHOOKS_CLAUDE_MD_SANITY_CHECK=true

CLAUDE.md is loaded by Claude Code on every turn. Every line in it costs tokens, every session. Agents writing unchecked content into CLAUDE.md files can quietly inflate the per-turn base cost while also degrading instruction quality through drift and dilution.

The sanity check intercepts any Write or Edit targeting a CLAUDE.md or CLAUDE.local.md file and simulates what the resulting file would look like. If the resulting line count exceeds the cap, the operation is blocked before the file is touched:

BLOCKED: Write to /home/user/.claude/CLAUDE.md would produce 347 lines,
exceeding the CLAUDE.md cap of 200 lines.
Trim the content to 200 lines or fewer before writing.

For Edit operations, the guard reads the current file from disk, applies the proposed change in memory, and counts the resulting lines — catching incremental bloat that would accumulate over many small edits.

Configuration

Variable Default Description
AGENTIHOOKS_CLAUDE_MD_SANITY_CHECK true Enable/disable the guardrail
AGENTIHOOKS_CLAUDE_MD_MAXLINES 200 Maximum allowed lines in CLAUDE.md files

Guardrail 6: MCP surface area

Hook: SessionStart (warn) Default: On — MCP_TOOL_WARN_THRESHOLD=40

Every MCP tool schema loaded into a session costs tokens — approximately 150 tokens per tool, every turn. With 9 MCP servers and 112 tools, that’s ~16,800 schema tokens injected into every single context turn before Claude writes a line of code.

At session start, agentihooks counts the total number of MCP tools across all configured servers. If the count exceeds MCP_TOOL_WARN_THRESHOLD (default 40), a warning banner fires:

MCP SURFACE AREA: 112 tools across 9 servers (~16,800 schema tokens/turn).
Consider disabling unused servers via /mcp to reduce per-turn overhead.

The companion CLI command provides the full breakdown:

agentihooks mcp report
MCP Surface Area Report
Total: 9 servers, ~112 tools, ~16,800 schema tokens

Server                         Source   Tools   ~Tokens
hooks-utils                      user      32     4,800
github                           user      40     6,000
postgres                         user      15     2,250
...

Configuration

Variable Default Description
MCP_TOOL_WARN_THRESHOLD 40 Tool count threshold for session-start warning
MCP_SCHEMA_AVG_TOKENS 150 Estimated tokens per tool schema

Guardrail 7: Output token limit

Hook: SessionStart (inject awareness) Default: Passive — activates when CLAUDE_CODE_MAX_OUTPUT_TOKENS is set

When CLAUDE_CODE_MAX_OUTPUT_TOKENS is set in the environment, agentihooks injects an awareness message into the session context at startup so Claude knows the limit is in effect and can plan accordingly:

OUTPUT TOKEN LIMIT: This session is capped at 8192 output tokens per response.
Plan responses to stay within this limit.

Without this injection, Claude can unknowingly start generating a response that will be cut off mid-output by the token limit, resulting in truncated tool calls, incomplete edits, or garbled output. The awareness injection prevents the surprise.

Configuration

Variable Default Description
CLAUDE_CODE_MAX_OUTPUT_TOKENS (unset) Output token cap; awareness injected when set

Guardrail 8: File read deduplication

Hook: PreToolUse (block redundant reads), PostToolUse (cache on read) Default: On — FILE_READ_CACHE_ENABLED=true

Tracks every file read during a session. If Claude tries to read the same file again and the file has not changed on disk since the last read, the operation is blocked:

BLOCKED: /path/to/file.py was already read this session and is unchanged on disk.
Use the content already in your context window.

This prevents a common and expensive pattern: Claude re-reading the same source file 3–5 times over the course of a long session, each read injecting the same content into the context window.

File identity is tracked by path + mtime. A file modified on disk since the last read passes through normally, and the cache entry is updated.

Configuration

Variable Default Description
FILE_READ_CACHE_ENABLED true Master switch
FILE_READ_CACHE_BACKEND redis redis or memory
FILE_READ_CACHE_TTL 21600 Redis key TTL (6 hours)

All guardrails at a glance

# Guardrail Hook event Default Block or warn Config key
1 Secrets scanner UserPromptSubmit / PreToolUse On Warn / Block AGENTIHOOKS_SECRETS_MODE
2 Retry circuit breaker PostToolUse / PreToolUse On Warn → Block RETRY_BREAKER_ENABLED
3 Branch guard PreToolUse On Block (always on)
4 Version guard PreToolUse On Block (always on)
5 CLAUDE.md sanity PreToolUse On Block AGENTIHOOKS_CLAUDE_MD_SANITY_CHECK
6 MCP surface area SessionStart On Warn MCP_TOOL_WARN_THRESHOLD
7 Output token limit SessionStart Passive Inject awareness CLAUDE_CODE_MAX_OUTPUT_TOKENS
8 File read dedup PreToolUse / PostToolUse On Block FILE_READ_CACHE_ENABLED

Exit code semantics

Guardrails that block communicate via exit code 2. Claude Code cancels the tool action and displays the hook’s stderr output as a warning to the agent.

Exit code Meaning
0 Allow — tool proceeds
2 Block — tool cancelled, stderr shown as warning

This is the same mechanism Claude Code uses for all hook blocks — guardrails are first-class citizens of the hook event pipeline.


Hardening your setup

The defaults cover the most common failure modes. For teams or higher-stakes environments, consider these adjustments:

# ~/.agentihooks/.env

# Stricter secrets scanning (adds Slack, Stripe, JWT patterns)
AGENTIHOOKS_SECRETS_MODE=strict

# Tighter retry tolerance (warn sooner, hard block sooner)
RETRY_BREAKER_MAX=3
RETRY_BREAKER_HARD_MAX=7

# Tighter CLAUDE.md size cap
AGENTIHOOKS_CLAUDE_MD_MAXLINES=150

# Warn earlier on MCP surface area
MCP_TOOL_WARN_THRESHOLD=25

# Set output token ceiling
CLAUDE_CODE_MAX_OUTPUT_TOKENS=8192

To verify all guardrails are active after installation:

agentihooks status

The status output lists each guardrail with its current state (enabled/disabled), mode, and thresholds. Inside a live session, /agentihooks shows the same panel alongside real-time session metrics.