AutoResearch
A Claude Code wrapper that makes any codebase measurably better overnight.
What Is This?
AutoResearch is an orchestrator built on top of Claude Code. You define a metric. Claude does the coding. AutoResearch decides what to keep.
autoresearch (orchestrator)
└── claude (the brain)
├── reads your code
├── forms a hypothesis
├── edits files
├── runs your metric
└── commits if improved, reverts if not
Reduce lint errors. Increase test coverage. Cut build times. Fix code smells. Anything you can measure with a shell command.
Quick Start
pip install tccw-autoresearch
cd your-project
autoresearch init
Claude opens interactively, scans your project, asks what to improve, configures the marker, measures baseline. Three commands from zero to running.
Prerequisites: Python 3.10+ and Claude Code installed.
How It Works
The .autoresearch/config.yaml marker declares what to improve:
markers:
- name: lint-quality
metric:
command: "ruff check src/ 2>&1"
extract: "grep -oP 'Found \\K\\d+'"
direction: lower
baseline: 163
target:
mutable: ["src/**/*.py"]
immutable: ["tests/**/*.py"]
agent:
budget_per_experiment: 20m
max_experiments: 10
The engine creates a git worktree, spawns a Claude Code agent, measures before/after, keeps improvements, discards regressions. Every kept experiment is a commit with full audit trail.
Production Results
Deployed on antoncore (3.3k LOC Python monorepo):
| Cycle | Before | After | Delta |
|---|---|---|---|
| 1 | 186 errors | 163 | -23 |
| 2 | 163 | 133 | -30 |
| 3 | 133 | 0 | -133 |
186 → 0 ruff errors in 3 cycles. Full GitHub PR audit trail.
Documentation
| Section | Topics |
|---|---|
| Architecture | Core loop, design principles |
| Marker Config | Schema, states, lifecycle |
| Engine | Experiment loop, escalation |
| CLI | 13 commands, interactive + headless |
| Agents | Default agent, custom profiles |
| Budget Countdown | PostToolUse time awareness |
| Gates | Gate chain, auto-publish PRs |
| Ruff Harness | Production reference |
| Production Deployment | Step-by-step guide |