Pillar 3: Context Intelligence

Your agents never forget their instructions, even 200 turns deep.

“Your protection window will stop being effective when Claude’s context window pushes your rules to token position 50,000. Context Intelligence re-injects them before that ever happens.”

The Attention Decay Problem
The Four Layers
Context Refresh
Priority Frontmatter
Token Compression Preprocessor
Tool Memory
Context Audit
Thinking Policy
Putting It Together
Quick Reference

The Attention Decay Problem

Large language models read your rules once — at position zero in the context window. As a conversation grows past 50, 100, or 200 turns, that early content drifts toward the attention horizon. The model hasn’t “forgotten” the bytes, but the transformer’s attention heads weight recent tokens far more heavily than early ones. Instructions that felt ironclad at turn 1 become soft suggestions by turn 80.

This is the number-one reliability killer for long agent sessions:

The agent starts ignoring clearance constraints
Commit style drifts away from the operator’s format
Tool calling patterns revert to defaults
Security rules erode silently

Context Intelligence is agentihooks’ answer: a multi-layer system that actively fights attention decay by keeping critical context fresh in the model’s recent window at all times.

The Four Layers

┌─────────────────────────────────────────────────────────────────────┐
│                     Context Intelligence Stack                      │
│                                                                     │
│  ┌──────────────┐  ┌─────────────────┐  ┌──────────────────────┐   │
│  │   Context    │  │    Priority     │  │   Token              │   │
│  │   Refresh    │  │   Frontmatter   │  │   Compression        │   │
│  │              │  │                 │  │                      │   │
│  │ rules / 20t  │  │ priority: 1 → N │  │ 4 levels: off→aggr   │   │
│  │ CLAUDE.md    │  │ loaded first    │  │ 30–55% savings       │   │
│  │ / 40 turns   │  │ within budget   │  │ with mask safety     │   │
│  └──────────────┘  └─────────────────┘  └──────────────────────┘   │
│                                                                     │
│  ┌──────────────┐  ┌─────────────────┐  ┌──────────────────────┐   │
│  │    Tool      │  │    Context      │  │   Thinking           │   │
│  │   Memory     │  │     Audit       │  │    Policy            │   │
│  │              │  │                 │  │                      │   │
│  │ past errors  │  │ top consumers   │  │ effort guidance      │   │
│  │ pre-tool use │  │ on session stop │  │ per model/profile    │   │
│  └──────────────┘  └─────────────────┘  └──────────────────────┘   │
└─────────────────────────────────────────────────────────────────────┘

Context Refresh

How it works

On every UserPromptSubmit event, agentihooks increments a per-session turn counter. Two independent timers govern re-injection:

Timer	Default	Target
Rules refresh	every 20 turns	`~/.claude/rules/.md` + project `.claude/rules/.md`
CLAUDE.md refresh	every 40 turns	`~/.claude/CLAUDE.md` + project `CLAUDE.md`

Turn state is persisted in Redis with a 24-hour TTL and falls back to a per-session JSON file at ~/.agentihooks/ctx_refresh_{session_id}.json. Because each hook invocation is a separate subprocess, in-memory state is not an option — persistence is mandatory.

Refresh lifecycle

flowchart TD
    A[UserPromptSubmit] --> B[Increment turn counter]
    B --> C{turn % 20 == 0?}
    C -- No --> E{turn % 40 == 0?}
    C -- Yes --> D[Load rules/*.md\nfrom global + project dirs]
    D --> F[Parse priority frontmatter\nfrom each file]
    F --> G[Sort ascending by priority,\nthen alphabetically]
    G --> H[Apply token compression\nat configured level]
    H --> I{Total within\n8000-char budget?}
    I -- Yes --> J[Inject all rules\nas system-reminder banner]
    I -- No --> K[Inject until budget full\nappend omitted count]
    J --> L[Persist turn state]
    K --> L
    E -- No --> L
    E -- Yes --> M[Load CLAUDE.md\nglobal + project]
    M --> N[Compress CLAUDE.md]
    N --> O[Inject as separate\nsystem-reminder banner]
    O --> L
    L --> P[Session continues]

Separate cadences by design

Rules and CLAUDE.md refresh on different timers intentionally. CLAUDE.md can be large (operator profiles often exceed 3,000 characters). Injecting it every 20 turns alongside rules would double the injection payload and exhaust the 8,000-character budget. By decoupling the cadences, each injection type gets its own full budget slice.

Configuration

Variable	Default	Description
`CONTEXT_REFRESH_ENABLED`	`true`	Enable/disable periodic re-injection
`CONTEXT_REFRESH_INTERVAL`	`20`	Re-inject rules every N user messages
`CONTEXT_REFRESH_CLAUDE_MD_INTERVAL`	`40`	Re-inject CLAUDE.md every N user messages. `0` disables it.
`CONTEXT_REFRESH_MAX_CHARS`	`8000`	Max characters per injection (~2,000 tokens)
`CONTEXT_REFRESH_RULES_DIR`	`~/.claude/rules`	Global rules directory
`CONTEXT_REFRESH_INCLUDE_PROJECT`	`true`	Also inject project-level rules from active working directory

Priority Frontmatter

When context is tight — and it always is in long sessions — not all rules can fit within the 8,000-character budget. Priority frontmatter lets you declare which rules are load-critical.

Syntax

Add a YAML frontmatter block to any rule file:

---
priority: 1
---

# My Critical Rule

Never commit credentials. Reference secrets via environment variables only.

Priority scale: Lower number = higher priority = loaded first.

Priority	Meaning
`1`	Always load. Mission-critical. Security constraints.
`2`	Load before most rules. Core behavioral contracts.
`3`	Standard importance.
`5`	Default. Applied when no frontmatter present.
`8–10`	Nice-to-have. First to be dropped under budget pressure.

Budget enforcement

Rules are sorted by (priority, filename) — ascending priority, then alphabetically within the same priority level. The injection loop fills from the top until the budget is exhausted. Any rules that don’t fit are dropped and replaced with a count line:

[3 rule(s) omitted — size limit reached]

If you see this in your logs, either raise CONTEXT_REFRESH_MAX_CHARS or move your lowest-priority rules to higher priority numbers.

Tip: audit your rule sizes

# See which rules are largest
wc -c ~/.claude/rules/*.md | sort -rn

# Check which would be dropped at 8000-char budget
python3 -c "
import os, re
rules_dir = os.path.expanduser('~/.claude/rules')
files = sorted(os.listdir(rules_dir))
total = 0
for f in files:
    path = os.path.join(rules_dir, f)
    size = os.path.getsize(path)
    total += size
    status = 'OK' if total <= 8000 else 'DROPPED'
    print(f'{status:8s} {total:6d}  {f}')
"

Token Compression Preprocessor

The insight

LLMs predict over subword tokens, not characters. The BPE tokenizer splits "authentication" into 3 tokens (auth, ent, ication). Writing "auth" instead costs 1 token — and the model activates the same semantic representation. The surrounding context (credentials, secrets, env vars) provides enough signal for the attention mechanism to infer full meaning.

This means you can compress injected content by 30–55% while fully preserving LLM comprehension — as long as you protect the tokens that carry critical operational semantics.

Compression levels

Level	Name	What it does	Token savings	Per 100-turn session
`0`	`off`	Passthrough — no modification	0%	0 tokens
`1`	`light`	Strip markdown formatting	~5–10%	~200–500 tokens
`2`	`standard`	Level 1 + filler word removal + abbreviation dict	~10–20%	~2,000–4,000 tokens
`3`	`aggressive`	Level 2 + internal vowel removal on long words	~20–35%	~4,000–8,000 tokens

Default is standard. Most operators see 10–20% per injection and 2,000–4,000 tokens saved per 100-turn session with zero behavioral change.

The compression pipeline

flowchart LR
    A[Raw rule text] --> B[Build protection mask]
    B --> C{Level >= 1?}
    C -- Yes --> D[Strip Mermaid blocks\nRemove markdown headers\nFlatten tables\nStrip bold/italic\nRemove horizontal rules]
    D --> E{Level >= 2?}
    C -- No --> Z[Output text]
    E -- Yes --> F[Remove filler words\na/an/the/is/are/was/were\nin/on/at/to/of/for/with...]
    F --> G[Apply abbreviation dict\nauthentication→auth\nkubernetes→k8s\nconfiguration→cfg\nproduction→prod...]
    G --> H{Level >= 3?}
    E -- No --> Z
    H -- Yes --> I[Remove internal vowels\nfrom long words only\nlength >= 7 chars]
    H -- No --> Z
    I --> Z

The protection mask

Before any compression runs, a protection mask identifies spans that must never be modified. A single mistakenly compressed negation or action verb can flip behavioral meaning — the mask prevents that.

Protected categories:

Category	Examples	Why
Negation words	`never`, `don't`, `not`, `cannot`, `won't`	Meaning is in the negation
Assertion words	`always`, `must`, `required`, `mandatory`, `only`, `exactly`	Constraint strength lives here
Action verbs	`push`, `delete`, `commit`, `deploy`, `force`, `reset`, `purge`	Operational semantics — compressing `delete` is dangerous
ALL_CAPS identifiers	`CONTEXT_REFRESH_MAX_CHARS`, `MY_API_KEY`	Env var names must be exact
Code blocks	`command`, ```block```	Already minimal; compressing code breaks it
File paths	`~/.claude/rules`, `./hooks/context`	Identifiers must be exact
Numbers	`8000`, `20`, `40`, `100%`	Thresholds must survive intact
CLI subcommands	`kubectl apply`, `docker rm`, `git push`	Tool invocations must be preserved

The mask uses character-level span tracking. When any compression transform matches a token range, it checks for overlap with the mask before substituting. Protected spans are skipped entirely.

Abbreviation dictionary

The built-in dictionary covers 50+ common DevOps/infrastructure terms. All replacements are longest-match-first to avoid partial substitutions.

Full term	Abbreviated	Full term	Abbreviated
`authentication`	`auth`	`kubernetes`	`k8s`
`authorization`	`authz`	`configuration`	`cfg`
`environment`	`env`	`deployment`	`deploy`
`infrastructure`	`infra`	`repository`	`repo`
`namespace`	`ns`	`application`	`app`
`production`	`prod`	`database`	`db`
`development`	`dev`	`service`	`svc`
`permissions`	`perms`	`certificate`	`cert`
`parameter`	`param`	`operation`	`op`
`specification`	`spec`	`automatically`	`auto`

You can extend the dictionary with a user-supplied JSON file:

# ~/.agentihooks/.env
CONTEXT_REFRESH_ABBREV_FILE=~/.claude/my-abbreviations.json

{
  "entries": {
    "microservice": "microsvc",
    "observability": "o11y",
    "authentication-token": "auth-token"
  }
}

Compression scope

`CONTEXT_COMPRESSION_SCOPE`	Applies to
`refresh` (default)	Context refresh injections only
`all`	All `inject_context()` / `inject_banner()` calls + bash output filter `additionalContext` — session start banners, secrets warnings, tool memory, circuit breaker messages, threshold warnings

Setting scope=all compounds savings across every injection point in the hook system.

Tool Memory

Agents learn from mistakes — but only within a single session. Close the terminal and the lesson is gone. Tool Memory persists cross-session error learning so agents don’t repeat the same failure twice.

How it works

At PreToolUse: Before the agent invokes any tool, agentihooks injects a TOOL MEMORY banner with the last N error entries from the NDJSON store. The agent reads these before acting.

At PostToolUse: If the tool response contains an error (non-zero exit code, is_error: true flag, or matched error patterns in Bash output), the error is recorded to the persistent store.

At Stop: Session-end transcript scan catches MCP tool errors that Claude Code does not fire PostToolUse for — they’re recorded retroactively so future sessions benefit.

Example injection

┌─ TOOL MEMORY: Lessons from past sessions ──────────────────────────┐
│ [2026-04-03 14:22] Bash -- Permission denied: /var/run/docker.sock  │
│   (input: docker ps -a)                                             │
│ [2026-04-05 09:14] mcp__anton__unraid_ssh -- Connection refused     │
│   (input: host=192.168.1.10)                                        │
│ [2026-04-06 22:38] Bash -- kubectl: command not found               │
│   (input: kubectl rollout status deploy/api -n prod)                │
└────────────────────────────────────────────────────────────────────┘

Deduplication: memory is injected once per tool per session — the same tool won’t receive a repeated banner on every invocation. This prevents noise in long sessions while still providing the warning at first use.

Configuration

Variable	Default	Description
`AGENTICORE_TOOL_MEMORY_PATH`	`~/.agenticore_tool_memory.ndjson`	Path to the NDJSON store
`AGENTICORE_TOOL_MEMORY_MAX`	`100`	Maximum entries to retain (oldest dropped first)
`AGENTICORE_TOOL_MEMORY_SHOW`	`15`	Entries to inject per `PreToolUse` event

Context Audit

What it tracks

On every PostToolUse, agentihooks records the byte size of the tool output against the tool name in a per-session Redis hash (with in-process fallback). By session end, you have a complete picture of what ate your context budget.

At Stop, if context fill exceeds CONTEXT_AUDIT_THRESHOLD_PCT (default 70%), the audit report is emitted:

Context audit (fill: 78%, total tool output: 842K):
  Bash: 512K (61%)
  mcp__jira__search_issues: 183K (22%)
  Read: 97K (12%)
  mcp__github__list_prs: 31K (4%)
  Write: 19K (2%)

Smart compact suggestions

When the audit report is available, the generic /compact reminder is replaced with a targeted suggestion naming the actual top consumers — so the operator knows exactly what to trim before compacting.

Configuration

Variable	Default	Description
`CONTEXT_AUDIT_ENABLED`	`true`	Enable per-tool byte tracking
`CONTEXT_AUDIT_THRESHOLD_PCT`	`70`	Fill % threshold for emitting the report at session end
`COMPACT_SUGGEST_ENABLED`	`true`	Replace generic `/compact` warnings with audit-informed suggestions

Thinking Policy

The problem with unconstrained reasoning

Extended thinking models accumulate reasoning tokens that count against the context window. An agent using high effort on every response — including trivial git status calls — can exhaust 10–20K tokens in pure reasoning per session before accomplishing anything.

How agentihooks governs it

At SessionStart, agentihooks injects effort guidance calibrated to the configured DEFAULT_EFFORT profile:

Profile	Injected guidance
`low`	Minimal reasoning for straightforward tasks. Escalate to medium/high only for complex architectural decisions.
`medium`	Standard reasoning depth. Reserve high/ultrathink for complex decisions. Prefer Sonnet for implementation; Opus for planning.
`high`	Full reasoning enabled. No constraint injected.

At PostToolUse, if an Agent tool is spawned with model=opus under a low or medium profile, a warning is emitted before the subagent runs.

Configuration

Variable	Default	Description
`EFFORT_POLICY_ENABLED`	`true`	Inject effort guidance at session start
`DEFAULT_EFFORT`	`medium`	Effort profile: `low`, `medium`, `high`
`THINKING_BUDGET_TOKENS`	`0`	Advisory thinking token ceiling per response. `0` = no limit.

Putting It Together

A 200-turn session with a 10-file rule profile might look like this:

Turn	Context Intelligence activity
1	Session start: thinking policy injected, tool memory injected
20	Rules refresh: 10 files sorted by priority, compressed (standard), injected (8K budget)
40	Rules refresh (turn 40) + CLAUDE.md refresh — both fire at turn 40
60	Rules refresh
80	Rules refresh + CLAUDE.md refresh
…	Pattern continues every 20/40 turns
200	Stop: context audit report emitted (78% fill), top consumers named

At each refresh, the rules land in the model’s recent context window — not at position zero. Attention weights are high. Instructions are fresh. The agent behaves as if the session just started.

That’s Context Intelligence.

Quick Reference

# Key env vars for ~/.agentihooks/.env
CONTEXT_REFRESH_ENABLED=true
CONTEXT_REFRESH_INTERVAL=20
CONTEXT_REFRESH_CLAUDE_MD_INTERVAL=40
CONTEXT_REFRESH_MAX_CHARS=8000
CONTEXT_REFRESH_COMPRESSION=standard      # off | light | standard | aggressive
CONTEXT_COMPRESSION_SCOPE=refresh         # refresh | all

CONTEXT_AUDIT_ENABLED=true
CONTEXT_AUDIT_THRESHOLD_PCT=70

EFFORT_POLICY_ENABLED=true
DEFAULT_EFFORT=medium                      # low | medium | high
THINKING_BUDGET_TOKENS=0

AGENTICORE_TOOL_MEMORY_MAX=100
AGENTICORE_TOOL_MEMORY_SHOW=15

# Add priority frontmatter to critical rule files
# ~/.claude/rules/operator-clearance.md
---
priority: 1
---
# Clearance Rules
...

Pillar 3: Context Intelligence

Table of contents

The Attention Decay Problem

The Four Layers

Context Refresh

How it works

Refresh lifecycle

Separate cadences by design

Configuration

Priority Frontmatter

Syntax

Budget enforcement

Tip: audit your rule sizes

Token Compression Preprocessor

The insight

Compression levels

The compression pipeline

The protection mask

Abbreviation dictionary

Compression scope

Tool Memory

How it works

Example injection

Configuration

Context Audit

What it tracks

Smart compact suggestions

Configuration

Thinking Policy

The problem with unconstrained reasoning

How agentihooks governs it

Configuration

Putting It Together

Quick Reference