Cost Management

Stop burning through your Claude Code quota. AgentiHooks ships a full cost management layer that watches every token entering and leaving your context window – and actively intervenes to keep spending under control.

Users report noticeably slower quota consumption after enabling AgentiHooks. Every feature below is on by default and works without configuration.

Table of contents

  1. The problem
  2. What you save
  3. Feature breakdown
    1. 1. Real-time cost display
    2. 2. Bash output filtering
    3. 3. File read deduplication
    4. 4. Context threshold warnings
    5. 5. MCP tool lazy loading
    6. 6. MCP hygiene reminders
    7. 7. Burn rate tracking
    8. 8. Console quota monitoring (opt-in)
    9. 9. Context audit tracking
    10. 10. Smart compact suggestions
    11. 11. Thinking/effort policy
    12. 12. Peak/off-peak awareness
    13. 13. MCP surface area reporting
    14. 14. CLAUDE.md linting and skill extraction
  4. Everything at a glance
  5. Quick start

The problem

Claude Code is powerful – but it’s expensive. Without guardrails:

  • Verbose bash output floods the context – a single docker logs or npm install can dump 10K+ tokens into the window
  • Redundant file reads waste tokens – Claude re-reads the same unchanged file 3-5 times per session
  • No visibility into burn rate – you don’t know you’ve consumed 80% of your context until it’s too late and the session resets
  • No plan-level quota awareness – you hit your weekly limit mid-task with no warning

AgentiHooks solves all four.


What you save

Feature What it prevents Estimated token savings
Bash output filtering Verbose docker/kubectl/git/test/build output flooding context 5K-50K tokens per command
File read deduplication Claude re-reading the same unchanged file multiple times 2K-20K tokens per duplicate read
MCP lazy loading 26 MCP tool schemas loaded upfront every turn ~79K tokens per session
Smart compact suggestions Generic “/compact” warnings that don’t tell you what to drop Faster, more effective compaction
Context audit tracking No visibility into what tools consume the most context Informed compaction decisions
Thinking/effort policy Extended thinking burning tens of thousands of output tokens 10K-50K tokens per over-think
Peak hour awareness Running expensive jobs during peak billing hours Session budget preservation
MCP surface area reporting Heavy MCP servers silently consuming context every turn 10K-100K tokens per session
CLAUDE.md linting Bloated CLAUDE.md paying tokens on every turn 500-5K tokens per turn
MCP hygiene reminders Unused MCP servers contributing schema tokens every turn 10K-100K tokens per session

A single session with all features active can save 100K-250K tokens compared to vanilla Claude Code. Over a week of heavy use, that’s the difference between hitting your quota on Wednesday vs. lasting through Friday.


Feature breakdown

1. Real-time cost display

Every turn, your status bar shows exactly what you’re spending:

############ 53% | Sonnet 4.6 | $0.1842 | 1h
ctx: 112K/200K | burn: 8K/turn | +42-12 | cache: 67% | main

Line 1: Context fill bar with color (green/yellow/red), model, cumulative session cost in USD, session duration.

Line 2: Raw token counts, burn rate per turn, lines changed, prompt cache hit ratio, git branch.

No setup required – active by default.


2. Bash output filtering

Detects verbose command output and truncates it before it enters the context window:

Command type Detection Truncation
docker logs / docker compose logs Command string match Last 50 lines (configurable)
kubectl commands Command string match Last 50 lines
git log Command string match Last 20 commits
pytest / jest / npm test / cargo test Output pattern match Last 10 failure blocks
npm install / pip install / cargo build Command string match Last 5000 chars
Everything else Fallback Hard cap at 5000 chars

The filter adds a clear notice when truncating:

[truncated: kept last 50 of 2847 lines]

Claude sees the most recent, most relevant output – not the entire history.

Config:

Variable Default What it controls
BASH_FILTER_ENABLED true Master switch
BASH_FILTER_MAX_LINES 50 Docker/kubectl line limit
BASH_FILTER_MAX_CHARS 5000 Build output char cap
BASH_FILTER_TEST_MAX_FAILURES 10 Test failure block limit
BASH_FILTER_GIT_MAX_COMMITS 20 Git log commit limit

3. File read deduplication

Blocks Claude from re-reading files it already has in context – the single biggest source of wasted tokens in long sessions.

How it works:

  1. On every Read tool call, the cache records the file path and its mtime
  2. If Claude tries to read the same file again, the hook checks whether the file has been modified since
  3. Modified? Read goes through, mtime is updated
  4. Unchanged? Read is blocked with a message: “File already read this session and unchanged on disk”

The cache uses Redis when available (persists across hook invocations) with an in-memory fallback.

Config:

Variable Default What it controls
FILE_READ_CACHE_ENABLED true Master switch
FILE_READ_CACHE_BACKEND redis redis or memory
FILE_READ_CACHE_TTL 21600 Redis key TTL (6 hours)

4. Context threshold warnings

Edge-triggered warnings fire exactly once per threshold crossing per session – no spam, no missed alerts:

Threshold Default What happens
Warning 60% Yellow banner: “CONTEXT 60% – consider /compact soon”
Critical 80% Red banner: “CONTEXT 80% – /compact now or start new session”

Warnings appear on statusline Line 3 so they’re impossible to miss. Edge-triggering is tracked in Redis – each level fires at most once.

Config:

Variable Default
TOKEN_WARN_PCT 60
TOKEN_CRITICAL_PCT 80

5. MCP tool lazy loading

Set ENABLE_TOOL_SEARCH=true (default) and all 26 MCP tools load on demand instead of injecting their full JSON schemas into every turn.

Before: ~79K tokens of tool schemas loaded upfront, every single turn. After: Tools appear as “(loaded on-demand)” and only expand when Claude actually uses them.

This is set in the env block of settings.json – the installer configures it automatically.


6. MCP hygiene reminders

At session start, AgentiHooks injects a reminder prompting Claude to check /mcp and disable any MCP servers not needed for the current task. Each disabled server saves its schema tokens on every subsequent turn.

Config: MCP_HYGIENE_ENABLED=true (default)


7. Burn rate tracking

Every turn, the statusline computes how many tokens were consumed since the previous turn:

burn: 8K/turn

This lets you spot runaway token consumption in real time – a burn: 45K/turn after a bash command tells you something verbose just hit the context. Requires Redis for cross-turn delta computation; omitted gracefully without it.


8. Console quota monitoring (opt-in)

A background daemon scrapes your Claude.ai usage page and surfaces plan-level quota on statusline Line 3:

quota: session:53% [1h] | all:35% resets fri 10:00 am | sonnet:5% resets mon 12:00 am | extra: $40/99 (40%) resets apr 1

You see your weekly quota percentage, per-model breakdown, extra usage spend, and reset times – all color-coded (green < 60%, yellow < 80%, red above).

Multi-account support: You can authenticate multiple Claude.ai accounts and switch between them:

agentihooks quota auth work          # authenticate as "work"
agentihooks quota auth personal      # authenticate as "personal"
agentihooks quota list               # show all accounts
agentihooks quota switch personal    # switch active account

Account credentials are stored at ~/.agentihooks/quota-accounts/<name>.json.

How it works:

  1. agentihooks quota auth opens your real browser to claude.ai, prompts you to paste the sessionKey cookie, and saves credentials
  2. A headless Chromium daemon (scripts/claude_usage_watcher.py) starts in the background, loading the saved auth state once at startup
  3. Every CLAUDE_USAGE_POLL_SEC seconds it navigates to claude.ai/settings/usage, parses usage data from the page, and writes it atomically to CLAUDE_USAGE_FILE
  4. hooks/statusline.py reads that JSON file each turn and renders Line 3

Session cookie expiry: Playwright loads the auth state once at startup. If your session cookie expires, the daemon silently stops finding data – it does not re-authenticate. Run agentihooks quota auth again to refresh; it automatically kills the stale daemon and starts a fresh one with the new credentials.

Setup:

# One-time: install headless browser (playwright is a core dependency)
~/.agentihooks/.venv/bin/python -m playwright install chromium

# Authenticate: opens your browser, paste the sessionKey cookie when prompted
agentihooks quota auth

# Enable display in ~/.agentihooks/.env
echo 'CLAUDE_USAGE_FILE=~/.agentihooks/claude_usage.json' >> ~/.agentihooks/.env

CLI commands:

Command What it does
agentihooks quota Start the background daemon
agentihooks quota auth [name] Authenticate an account (kills + restarts daemon automatically)
agentihooks quota list Show all configured accounts
agentihooks quota switch <name> Switch active account
agentihooks quota restart Restart daemon with current account
agentihooks quota status Print the last known quota JSON
agentihooks quota logs Tail the daemon log
agentihooks quota stop Kill the daemon
agentihooks quota remove <name> Remove an account
agentihooks quota dump-html Dump raw usage page HTML for debugging

Config:

Variable Default What it controls
CLAUDE_USAGE_FILE Path to quota JSON (enables the feature)
CLAUDE_USAGE_POLL_SEC 60 Daemon poll interval
CLAUDE_USAGE_STALE_SEC 300 Data staleness threshold – statusline shows “stale” if data is older than this

Troubleshooting – “No quota data found” in daemon log:

The daemon logs this when it successfully loads the page but finds no usage data – typically because the page structure changed or the session cookie expired.

# 1. Check if the daemon is running and what it last scraped
agentihooks quota status
agentihooks quota logs

# 2. Dump the raw page HTML to inspect what Playwright is actually seeing
agentihooks quota dump-html
cat ~/.agentihooks/usage_debug.html | grep -i 'session\|usage\|progress\|percent'

# 3. If the cookie is stale, re-authenticate (auto-restarts daemon)
agentihooks quota auth

9. Context audit tracking

Tracks cumulative byte output per tool type across the session. When context fill exceeds the audit threshold on Stop, a report is logged showing the top 5 consumers.

Context audit (fill: 82%, total tool output: 245K):
  Read: 120K (49%)
  Bash: 65K (27%)
  Agent: 38K (16%)
  Edit: 12K (5%)
  Grep: 10K (4%)
Variable Default What it controls
CONTEXT_AUDIT_ENABLED true Enable per-tool tracking
CONTEXT_AUDIT_THRESHOLD_PCT 70 Emit report when fill exceeds this %

10. Smart compact suggestions

Replaces generic “/compact” warnings with actionable suggestions based on context audit data:

CONTEXT 65% -- consider /compact soon -- top consumers: Read (50K), Bash (32K), Agent (28K)
Variable Default What it controls
COMPACT_SUGGEST_ENABLED true Use smart suggestions vs generic warnings

11. Thinking/effort policy

Injects effort guidance at session start based on profile settings. Warns when subagents are spawned with unnecessarily expensive models.

TOKEN EFFICIENCY: Default effort: medium. Reserve high/ultrathink for complex
architectural decisions. Prefer Sonnet for implementation; reserve Opus for planning.
Variable Default What it controls
EFFORT_POLICY_ENABLED true Inject effort guidance at session start
DEFAULT_EFFORT medium Default reasoning depth (low/medium/high)
THINKING_BUDGET_TOKENS 0 Advisory token ceiling (0 = no limit)

12. Peak/off-peak awareness

Detects Anthropic’s peak billing hours (weekday business hours US Pacific) and shows an indicator on the statusline. When session usage is high during peak hours, adds a warning.

quota: session:62% [1h] | PEAK -- sessions burn faster during business hours
Variable Default What it controls
PEAK_HOURS_ENABLED true Show peak indicator on statusline
PEAK_HOURS_START 9 Peak start hour
PEAK_HOURS_END 17 Peak end hour
PEAK_HOURS_TZ US/Pacific Timezone for peak calculation

13. MCP surface area reporting

CLI tool that analyzes MCP server configurations and reports estimated token overhead. Also warns at session start if total tools exceed a threshold.

agentihooks mcp report
MCP Surface Area Report
Total: 9 servers, ~112 tools, ~16,800 schema tokens

Server                         Source   Tools   ~Tokens
hooks-utils                      user      32     4,800
github                           user      40     6,000
...
Variable Default What it controls
MCP_TOOL_WARN_THRESHOLD 40 Warn at session start if total tools exceed this
MCP_SCHEMA_AVG_TOKENS 150 Estimated tokens per tool schema

14. CLAUDE.md linting and skill extraction

CLI tool that analyzes CLAUDE.md token cost and suggests extracting workflow-specific sections into on-demand skills.

agentihooks lint-claude                           # analyze ~/.claude/CLAUDE.md
agentihooks extract-skill "Commands" --name cmds  # extract to skill

Moving workflow-specific content from CLAUDE.md (loaded every turn) to skills (loaded on demand) reduces base context cost by 500-5K tokens per turn.


Everything at a glance

Layer Feature Default Tokens saved Config
Output Bash output filtering On 5K-50K/cmd BASH_FILTER_ENABLED
Input File read dedup On 2K-20K/read FILE_READ_CACHE_ENABLED
Schema MCP lazy loading On ~79K/session ENABLE_TOOL_SEARCH
Schema MCP hygiene reminder On 10K-100K/session MCP_HYGIENE_ENABLED
Schema MCP surface area reporting On 10K-100K/session MCP_TOOL_WARN_THRESHOLD
Awareness Statusline cost/burn On Prevents waste TOKEN_MONITOR_ENABLED
Awareness Context warnings (smart) On Prevents resets COMPACT_SUGGEST_ENABLED
Awareness Context audit On Informed compaction CONTEXT_AUDIT_ENABLED
Awareness Peak hour indicator On Budget preservation PEAK_HOURS_ENABLED
Awareness Console quota display Opt-in Prevents limit hits CLAUDE_USAGE_FILE
Decode Thinking/effort policy On 10K-50K/over-think EFFORT_POLICY_ENABLED
Startup CLAUDE.md linting CLI 500-5K/turn agentihooks lint-claude

Master switch: Set TOKEN_CONTROL_ENABLED=false to disable all token control features at once. Individual features can be toggled independently.


Quick start

Everything except quota monitoring works out of the box after installation:

# Install agentihooks -- all cost features are enabled by default
agentihooks init

To add quota monitoring (playwright ships with the package):

~/.agentihooks/.venv/bin/python -m playwright install chromium
agentihooks quota auth
echo 'CLAUDE_USAGE_FILE=~/.agentihooks/claude_usage.json' >> ~/.agentihooks/.env

Verify everything is working:

agentihooks status

This shows your full system health: profile, hooks, Python, daemons, Redis, OTEL, all 6 cost guardrails with descriptions, your entire MCP fleet with real tool counts (queried via MCP protocol, cached 1h), per-project enabled/disabled state, and quota summary with peak/off-peak indicator.

Inside a Claude session, use /agentihooks for the same diagnostics plus live session metrics (context fill, burn rate, per-tool consumption).