Metrics
Metric extraction, guard gates, dual-gate verification.
Overview
The metrics module (metrics.py) handles running harness commands, extracting numeric values, and evaluating guard conditions.
Harness Execution
The metric.command from the marker is executed as a shell command in the worktree directory. Output is captured and passed through the metric.extract pattern to produce a single numeric value.
Dual-Gate Verification
Each experiment can be validated by two independent gates:
- Metric gate — Did the primary metric improve (based on
direction)? - Guard gate — Does the broader test suite still pass (based on
guard.threshold)?
Both must pass for changes to be kept.
Confidence Scoring
The engine computes confidence levels for metric measurements to account for noisy metrics. This helps distinguish genuine improvements from statistical noise.
Direction
higher— larger values are better (e.g., test pass count, coverage %)lower— smaller values are better (e.g., build time, error count)