AutoResearch

A Claude Code wrapper that makes any codebase measurably better overnight.


What Is This?

AutoResearch is an orchestrator built on top of Claude Code. You define a metric. Claude does the coding. AutoResearch decides what to keep.

autoresearch (orchestrator)
  └── claude (the brain)
        ├── reads your code
        ├── forms a hypothesis
        ├── edits files
        ├── runs your metric
        └── commits if improved, reverts if not

Reduce lint errors. Increase test coverage. Cut build times. Fix code smells. Anything you can measure with a shell command.


Quick Start

pip install tccw-autoresearch
cd your-project
autoresearch init

Claude opens interactively, scans your project, asks what to improve, configures the marker, measures baseline. Three commands from zero to running.

Prerequisites: Python 3.10+ and Claude Code installed.


How It Works

The .autoresearch/config.yaml marker declares what to improve:

markers:
  - name: lint-quality
    metric:
      command: "ruff check src/ 2>&1"
      extract: "grep -oP 'Found \\K\\d+'"
      direction: lower
      baseline: 163
    target:
      mutable: ["src/**/*.py"]
      immutable: ["tests/**/*.py"]
    agent:
      budget_per_experiment: 20m
      max_experiments: 10

The engine creates a git worktree, spawns a Claude Code agent, measures before/after, keeps improvements, discards regressions. Every kept experiment is a commit with full audit trail.


Production Results

Deployed on antoncore (3.3k LOC Python monorepo):

Cycle Before After Delta
1 186 errors 163 -23
2 163 133 -30
3 133 0 -133

186 → 0 ruff errors in 3 cycles. Full GitHub PR audit trail.


Documentation

Section Topics
Architecture Core loop, design principles
Marker Config Schema, states, lifecycle
Engine Experiment loop, escalation
CLI 13 commands, interactive + headless
Agents Default agent, custom profiles
Budget Countdown PostToolUse time awareness
Gates Gate chain, auto-publish PRs
Ruff Harness Production reference
Production Deployment Step-by-step guide