Architecture

System architecture, core loop, and design principles for tccw-autoresearch.

Overview

tccw-autoresearch is an agnostic autonomous improvement engine that implements the Karpathy autoresearch loop pattern for any codebase. It edits target files, runs an immutable harness, measures a single metric, keeps improvements, and discards regressions.

Core Loop

LOOP:
Generate improvement idea (AI agent)
Edit mutable target files
Run immutable harness (metric command)
Extract metric value
If improved → git commit, record result
If worse → git reset, record result
If guard fails → attempt rework (up to N times)
REPEAT until budget exhausted or target reached

Hard Dependency: Claude Code

AutoResearch is an orchestrator, not a code editor. The actual code changes are performed by Claude Code agents (claude CLI), spawned as subprocesses.

autoresearch (orchestrator)
  ├── spawns claude (agent) ← hard dependency, does the actual coding
  │     ├── reads mutable target files
  │     ├── edits code based on improvement ideas
  │     └── returns control to orchestrator
  └── engine (orchestrator decides)
        ├── runs metric harness
        ├── evaluates improvement
        └── git commit (keep) or git reset (discard)

The engine passes each agent:

Mutable/immutable file rules — translated to --allowedTools / --disallowedTools CLI flags
Agent profile — CLAUDE.md, rules, and settings from .autoresearch/agents/
Program — generated instructions describing what to attempt

Without claude on PATH, the engine cannot run experiments.

Key Design Principles

The repo is the engine — .autoresearch/config.yaml carries everything needed to run
Agnostic — no stack-specific, provider-specific, or infrastructure-specific code
Dual-mode CLI — every command works interactively (TUI) and headlessly (JSON)
Marker-driven — markers declare what to improve; the engine executes
Worktree isolation — each marker runs in its own git worktree

Module Map

Module	Purpose
`cli.py`	CLI entry point (Typer + Rich)
`engine.py`	Core experiment loop
`marker.py`	`.autoresearch/config.yaml` parser + Pydantic schema
`metrics.py`	Metric extraction + guard gates
`state.py`	`state.json` management
`worktree.py`	Git worktree lifecycle
`daemon.py`	Background daemon service
`results.py`	`results.tsv` tracking
`ideas.py`	Idea generation + history
`program.py`	Program synthesis for experiments
`config.py`	Global config (`~/.autoresearch/`)
`telemetry.py`	Run telemetry and cost tracking
`agent_profile.py`	Agent profile loading

Data Flow

`.autoresearch/config.yaml` → marker.py → engine.py → worktree.py (isolate)
                                     ↓
                              metrics.py (measure)
                                     ↓
                              results.py (record)
                                     ↓
                              state.py (persist)