Cost Management
Stop burning through your Claude Code quota. AgentiHooks ships a full cost management layer that watches every token entering and leaving your context window – and actively intervenes to keep spending under control.
Users report noticeably slower quota consumption after enabling AgentiHooks. Every feature below is on by default and works without configuration.
Table of contents
- The problem
- What you save
- Feature breakdown
- 1. Real-time cost display
- 2. Bash output filtering
- 3. File read deduplication
- 4. Context threshold warnings
- 5. MCP tool lazy loading
- 6. MCP hygiene reminders
- 7. Burn rate tracking
- 8. Console quota monitoring (opt-in)
- 9. Context audit tracking
- 10. Smart compact suggestions
- 11. Thinking/effort policy
- 12. Peak/off-peak awareness
- 13. MCP surface area reporting
- 14. CLAUDE.md linting and skill extraction
- Everything at a glance
- Quick start
The problem
Claude Code is powerful – but it’s expensive. Without guardrails:
- Verbose bash output floods the context – a single
docker logsornpm installcan dump 10K+ tokens into the window - Redundant file reads waste tokens – Claude re-reads the same unchanged file 3-5 times per session
- No visibility into burn rate – you don’t know you’ve consumed 80% of your context until it’s too late and the session resets
- No plan-level quota awareness – you hit your weekly limit mid-task with no warning
AgentiHooks solves all four.
What you save
| Feature | What it prevents | Estimated token savings |
|---|---|---|
| Bash output filtering | Verbose docker/kubectl/git/test/build output flooding context | 5K-50K tokens per command |
| File read deduplication | Claude re-reading the same unchanged file multiple times | 2K-20K tokens per duplicate read |
| MCP lazy loading | 26 MCP tool schemas loaded upfront every turn | ~79K tokens per session |
| Smart compact suggestions | Generic “/compact” warnings that don’t tell you what to drop | Faster, more effective compaction |
| Context audit tracking | No visibility into what tools consume the most context | Informed compaction decisions |
| Thinking/effort policy | Extended thinking burning tens of thousands of output tokens | 10K-50K tokens per over-think |
| Peak hour awareness | Running expensive jobs during peak billing hours | Session budget preservation |
| MCP surface area reporting | Heavy MCP servers silently consuming context every turn | 10K-100K tokens per session |
| CLAUDE.md linting | Bloated CLAUDE.md paying tokens on every turn | 500-5K tokens per turn |
| MCP hygiene reminders | Unused MCP servers contributing schema tokens every turn | 10K-100K tokens per session |
A single session with all features active can save 100K-250K tokens compared to vanilla Claude Code. Over a week of heavy use, that’s the difference between hitting your quota on Wednesday vs. lasting through Friday.
Feature breakdown
1. Real-time cost display
Every turn, your status bar shows exactly what you’re spending:
############ 53% | Sonnet 4.6 | $0.1842 | 1h
ctx: 112K/200K | burn: 8K/turn | +42-12 | cache: 67% | main
Line 1: Context fill bar with color (green/yellow/red), model, cumulative session cost in USD, session duration.
Line 2: Raw token counts, burn rate per turn, lines changed, prompt cache hit ratio, git branch.
No setup required – active by default.
2. Bash output filtering
Detects verbose command output and truncates it before it enters the context window:
| Command type | Detection | Truncation |
|---|---|---|
docker logs / docker compose logs | Command string match | Last 50 lines (configurable) |
kubectl commands | Command string match | Last 50 lines |
git log | Command string match | Last 20 commits |
pytest / jest / npm test / cargo test | Output pattern match | Last 10 failure blocks |
npm install / pip install / cargo build | Command string match | Last 5000 chars |
| Everything else | Fallback | Hard cap at 5000 chars |
The filter adds a clear notice when truncating:
[truncated: kept last 50 of 2847 lines]
Claude sees the most recent, most relevant output – not the entire history.
Config:
| Variable | Default | What it controls |
|---|---|---|
BASH_FILTER_ENABLED | true | Master switch |
BASH_FILTER_MAX_LINES | 50 | Docker/kubectl line limit |
BASH_FILTER_MAX_CHARS | 5000 | Build output char cap |
BASH_FILTER_TEST_MAX_FAILURES | 10 | Test failure block limit |
BASH_FILTER_GIT_MAX_COMMITS | 20 | Git log commit limit |
3. File read deduplication
Blocks Claude from re-reading files it already has in context – the single biggest source of wasted tokens in long sessions.
How it works:
- On every
Readtool call, the cache records the file path and itsmtime - If Claude tries to read the same file again, the hook checks whether the file has been modified since
- Modified? Read goes through, mtime is updated
- Unchanged? Read is blocked with a message: “File already read this session and unchanged on disk”
The cache uses Redis when available (persists across hook invocations) with an in-memory fallback.
Config:
| Variable | Default | What it controls |
|---|---|---|
FILE_READ_CACHE_ENABLED | true | Master switch |
FILE_READ_CACHE_BACKEND | redis | redis or memory |
FILE_READ_CACHE_TTL | 21600 | Redis key TTL (6 hours) |
4. Context threshold warnings
Edge-triggered warnings fire exactly once per threshold crossing per session – no spam, no missed alerts:
| Threshold | Default | What happens |
|---|---|---|
| Warning | 60% | Yellow banner: “CONTEXT 60% – consider /compact soon” |
| Critical | 80% | Red banner: “CONTEXT 80% – /compact now or start new session” |
Warnings appear on statusline Line 3 so they’re impossible to miss. Edge-triggering is tracked in Redis – each level fires at most once.
Config:
| Variable | Default |
|---|---|
TOKEN_WARN_PCT | 60 |
TOKEN_CRITICAL_PCT | 80 |
5. MCP tool lazy loading
Set ENABLE_TOOL_SEARCH=true (default) and all 26 MCP tools load on demand instead of injecting their full JSON schemas into every turn.
Before: ~79K tokens of tool schemas loaded upfront, every single turn. After: Tools appear as “(loaded on-demand)” and only expand when Claude actually uses them.
This is set in the env block of settings.json – the installer configures it automatically.
6. MCP hygiene reminders
At session start, AgentiHooks injects a reminder prompting Claude to check /mcp and disable any MCP servers not needed for the current task. Each disabled server saves its schema tokens on every subsequent turn.
Config: MCP_HYGIENE_ENABLED=true (default)
7. Burn rate tracking
Every turn, the statusline computes how many tokens were consumed since the previous turn:
burn: 8K/turn
This lets you spot runaway token consumption in real time – a burn: 45K/turn after a bash command tells you something verbose just hit the context. Requires Redis for cross-turn delta computation; omitted gracefully without it.
8. Console quota monitoring (opt-in)
A background daemon scrapes your Claude.ai usage page and surfaces plan-level quota on statusline Line 3:
quota: session:53% [1h] | all:35% resets fri 10:00 am | sonnet:5% resets mon 12:00 am | extra: $40/99 (40%) resets apr 1
You see your weekly quota percentage, per-model breakdown, extra usage spend, and reset times – all color-coded (green < 60%, yellow < 80%, red above).
Multi-account support: You can authenticate multiple Claude.ai accounts and switch between them:
agentihooks quota auth work # authenticate as "work"
agentihooks quota auth personal # authenticate as "personal"
agentihooks quota list # show all accounts
agentihooks quota switch personal # switch active account
Account credentials are stored at ~/.agentihooks/quota-accounts/<name>.json.
How it works:
agentihooks quota authopens your real browser toclaude.ai, prompts you to paste thesessionKeycookie, and saves credentials- A headless Chromium daemon (
scripts/claude_usage_watcher.py) starts in the background, loading the saved auth state once at startup - Every
CLAUDE_USAGE_POLL_SECseconds it navigates toclaude.ai/settings/usage, parses usage data from the page, and writes it atomically toCLAUDE_USAGE_FILE hooks/statusline.pyreads that JSON file each turn and renders Line 3
Session cookie expiry: Playwright loads the auth state once at startup. If your session cookie expires, the daemon silently stops finding data – it does not re-authenticate. Run
agentihooks quota authagain to refresh; it automatically kills the stale daemon and starts a fresh one with the new credentials.
Setup:
# One-time: install headless browser (playwright is a core dependency)
~/.agentihooks/.venv/bin/python -m playwright install chromium
# Authenticate: opens your browser, paste the sessionKey cookie when prompted
agentihooks quota auth
# Enable display in ~/.agentihooks/.env
echo 'CLAUDE_USAGE_FILE=~/.agentihooks/claude_usage.json' >> ~/.agentihooks/.env
CLI commands:
| Command | What it does |
|---|---|
agentihooks quota | Start the background daemon |
agentihooks quota auth [name] | Authenticate an account (kills + restarts daemon automatically) |
agentihooks quota list | Show all configured accounts |
agentihooks quota switch <name> | Switch active account |
agentihooks quota restart | Restart daemon with current account |
agentihooks quota status | Print the last known quota JSON |
agentihooks quota logs | Tail the daemon log |
agentihooks quota stop | Kill the daemon |
agentihooks quota remove <name> | Remove an account |
agentihooks quota dump-html | Dump raw usage page HTML for debugging |
Config:
| Variable | Default | What it controls |
|---|---|---|
CLAUDE_USAGE_FILE | – | Path to quota JSON (enables the feature) |
CLAUDE_USAGE_POLL_SEC | 60 | Daemon poll interval |
CLAUDE_USAGE_STALE_SEC | 300 | Data staleness threshold – statusline shows “stale” if data is older than this |
Troubleshooting – “No quota data found” in daemon log:
The daemon logs this when it successfully loads the page but finds no usage data – typically because the page structure changed or the session cookie expired.
# 1. Check if the daemon is running and what it last scraped
agentihooks quota status
agentihooks quota logs
# 2. Dump the raw page HTML to inspect what Playwright is actually seeing
agentihooks quota dump-html
cat ~/.agentihooks/usage_debug.html | grep -i 'session\|usage\|progress\|percent'
# 3. If the cookie is stale, re-authenticate (auto-restarts daemon)
agentihooks quota auth
9. Context audit tracking
Tracks cumulative byte output per tool type across the session. When context fill exceeds the audit threshold on Stop, a report is logged showing the top 5 consumers.
Context audit (fill: 82%, total tool output: 245K):
Read: 120K (49%)
Bash: 65K (27%)
Agent: 38K (16%)
Edit: 12K (5%)
Grep: 10K (4%)
| Variable | Default | What it controls |
|---|---|---|
CONTEXT_AUDIT_ENABLED | true | Enable per-tool tracking |
CONTEXT_AUDIT_THRESHOLD_PCT | 70 | Emit report when fill exceeds this % |
10. Smart compact suggestions
Replaces generic “/compact” warnings with actionable suggestions based on context audit data:
CONTEXT 65% -- consider /compact soon -- top consumers: Read (50K), Bash (32K), Agent (28K)
| Variable | Default | What it controls |
|---|---|---|
COMPACT_SUGGEST_ENABLED | true | Use smart suggestions vs generic warnings |
11. Thinking/effort policy
Injects effort guidance at session start based on profile settings. Warns when subagents are spawned with unnecessarily expensive models.
TOKEN EFFICIENCY: Default effort: medium. Reserve high/ultrathink for complex
architectural decisions. Prefer Sonnet for implementation; reserve Opus for planning.
| Variable | Default | What it controls |
|---|---|---|
EFFORT_POLICY_ENABLED | true | Inject effort guidance at session start |
DEFAULT_EFFORT | medium | Default reasoning depth (low/medium/high) |
THINKING_BUDGET_TOKENS | 0 | Advisory token ceiling (0 = no limit) |
12. Peak/off-peak awareness
Detects Anthropic’s peak billing hours (weekday business hours US Pacific) and shows an indicator on the statusline. When session usage is high during peak hours, adds a warning.
quota: session:62% [1h] | PEAK -- sessions burn faster during business hours
| Variable | Default | What it controls |
|---|---|---|
PEAK_HOURS_ENABLED | true | Show peak indicator on statusline |
PEAK_HOURS_START | 9 | Peak start hour |
PEAK_HOURS_END | 17 | Peak end hour |
PEAK_HOURS_TZ | US/Pacific | Timezone for peak calculation |
13. MCP surface area reporting
CLI tool that analyzes MCP server configurations and reports estimated token overhead. Also warns at session start if total tools exceed a threshold.
agentihooks mcp report
MCP Surface Area Report
Total: 9 servers, ~112 tools, ~16,800 schema tokens
Server Source Tools ~Tokens
hooks-utils user 32 4,800
github user 40 6,000
...
| Variable | Default | What it controls |
|---|---|---|
MCP_TOOL_WARN_THRESHOLD | 40 | Warn at session start if total tools exceed this |
MCP_SCHEMA_AVG_TOKENS | 150 | Estimated tokens per tool schema |
14. CLAUDE.md linting and skill extraction
CLI tool that analyzes CLAUDE.md token cost and suggests extracting workflow-specific sections into on-demand skills.
agentihooks lint-claude # analyze ~/.claude/CLAUDE.md
agentihooks extract-skill "Commands" --name cmds # extract to skill
Moving workflow-specific content from CLAUDE.md (loaded every turn) to skills (loaded on demand) reduces base context cost by 500-5K tokens per turn.
Everything at a glance
| Layer | Feature | Default | Tokens saved | Config |
|---|---|---|---|---|
| Output | Bash output filtering | On | 5K-50K/cmd | BASH_FILTER_ENABLED |
| Input | File read dedup | On | 2K-20K/read | FILE_READ_CACHE_ENABLED |
| Schema | MCP lazy loading | On | ~79K/session | ENABLE_TOOL_SEARCH |
| Schema | MCP hygiene reminder | On | 10K-100K/session | MCP_HYGIENE_ENABLED |
| Schema | MCP surface area reporting | On | 10K-100K/session | MCP_TOOL_WARN_THRESHOLD |
| Awareness | Statusline cost/burn | On | Prevents waste | TOKEN_MONITOR_ENABLED |
| Awareness | Context warnings (smart) | On | Prevents resets | COMPACT_SUGGEST_ENABLED |
| Awareness | Context audit | On | Informed compaction | CONTEXT_AUDIT_ENABLED |
| Awareness | Peak hour indicator | On | Budget preservation | PEAK_HOURS_ENABLED |
| Awareness | Console quota display | Opt-in | Prevents limit hits | CLAUDE_USAGE_FILE |
| Decode | Thinking/effort policy | On | 10K-50K/over-think | EFFORT_POLICY_ENABLED |
| Startup | CLAUDE.md linting | CLI | 500-5K/turn | agentihooks lint-claude |
Master switch: Set TOKEN_CONTROL_ENABLED=false to disable all token control features at once. Individual features can be toggled independently.
Quick start
Everything except quota monitoring works out of the box after installation:
# Install agentihooks -- all cost features are enabled by default
agentihooks init
To add quota monitoring (playwright ships with the package):
~/.agentihooks/.venv/bin/python -m playwright install chromium
agentihooks quota auth
echo 'CLAUDE_USAGE_FILE=~/.agentihooks/claude_usage.json' >> ~/.agentihooks/.env
Verify everything is working:
agentihooks status
This shows your full system health: profile, hooks, Python, daemons, Redis, OTEL, all 6 cost guardrails with descriptions, your entire MCP fleet with real tool counts (queried via MCP protocol, cached 1h), per-project enabled/disabled state, and quota summary with peak/off-peak indicator.
Inside a Claude session, use /agentihooks for the same diagnostics plus live session metrics (context fill, burn rate, per-tool consumption).