AutoResearch

A Claude Code wrapper that makes any codebase measurably better overnight.

What Is This?

AutoResearch is an orchestrator built on top of Claude Code. You define a metric. Claude does the coding. AutoResearch decides what to keep.

autoresearch (orchestrator)
  └── claude (the brain)
        ├── reads your code
        ├── forms a hypothesis
        ├── edits files
        ├── runs your metric
        └── commits if improved, reverts if not

Reduce lint errors. Increase test coverage. Cut build times. Fix code smells. Anything you can measure with a shell command.

Quick Start

pip install tccw-autoresearch
cd your-project
autoresearch init

Claude opens interactively, scans your project, asks what to improve, configures the marker, measures baseline. Three commands from zero to running.

Prerequisites: Python 3.10+ and Claude Code installed.

How It Works

The .autoresearch/config.yaml marker declares what to improve:

markers:
  - name: lint-quality
    metric:
      command: "ruff check src/ 2>&1"
      extract: "grep -oP 'Found \\K\\d+'"
      direction: lower
      baseline: 163
    target:
      mutable: ["src/**/*.py"]
      immutable: ["tests/**/*.py"]
    agent:
      budget_per_experiment: 20m
      max_experiments: 10

The engine creates a git worktree, spawns a Claude Code agent, measures before/after, keeps improvements, discards regressions. Every kept experiment is a commit with full audit trail.

Production Results

Deployed on antoncore (3.3k LOC Python monorepo):

Cycle	Before	After	Delta
1	186 errors	163	-23
2	163	133	-30
3	133	0	-133

186 → 0 ruff errors in 3 cycles. Full GitHub PR audit trail.

Documentation

Section	Topics
Architecture	Core loop, design principles
Marker Config	Schema, states, lifecycle
Engine	Experiment loop, escalation
CLI	13 commands, interactive + headless
Agents	Default agent, custom profiles
Budget Countdown	PostToolUse time awareness
Gates	Gate chain, auto-publish PRs
Ruff Harness	Production reference
Production Deployment	Step-by-step guide