Agenticore Documentation
Two modes, one binary. Agenticore is a production-grade Claude Code runner that runs in two distinct shapes — switched at runtime by a single environment variable.
┌─── AGENT_MODE=false (default) ────────────┐
│ FLEET MODE — Orchestrator │
│ Submit a task, get a PR │
│ │
MCP / REST / CLI ─────►│ clone repo ──► bespoke worktree │
│ │ │ │
│ └──► claude -p "<task>" ──► auto-PR │
│ └──► OTEL │
│ KEDA-scaled fleet • work-stealing queue │
┌─────────────┐ └────────────────────────────────────────────┘
│ agenticore │
│ binary │
└─────────────┘ ┌─── AGENT_MODE=true ────────────────────────┐
│ AGENT MODE — Customized agent endpoint │
│ Drop-in OpenAI chat completion server │
│ │
OpenAI-compatible ────►│ load agent package (system prompt, MCP │
chat clients │ servers, hooks, skills, identity) │
(LibreChat, │ │
OpenWebUI, │ POST /v1/chat/completions stream=true │
LiteLLM, │ │ │
custom UI, │ └─► live SSE deltas: │
raw curl -N) │ thinking_delta (token-by-token) │
│ tool_use + tool_result │
│ assistant text │
│ │
│ Sticky slash toggles per agent │
│ Fully auditable — wire/disk/Redis layers │
└────────────────────────────────────────────┘
Pick a mode
| Fleet mode (default) | Agent mode (AGENT_MODE=true) | |
|---|---|---|
| What it does | Accepts coding tasks, clones repos, runs Claude in worktrees, opens PRs | Loads a customized Claude agent package and exposes it as a chat completion endpoint |
| API surface | /jobs REST · run_task MCP · agenticore run CLI | /v1/chat/completions — fully OpenAI-compatible, streaming and non-streaming |
| Lifecycle | Per-job clone + worktree, discarded after PR | Long-lived agent identity loaded once at startup |
| Output | A pull request, an OTEL trace | Live SSE deltas as chat.completion.chunk JSON, full transcript on disk |
| Drop-in for | CI/CD pipelines, MCP-aware editors, “fix this” bots | LibreChat, OpenWebUI, LiteLLM, any OpenAI SDK client |
Both modes share the same binary, same Docker image, same Helm chart, same profile system, same Redis+file fallback, and same OTEL trace pipeline.
Why it matters
You have Claude Code. You want it to do work for you programmatically. You have two shapes the work tends to take:
-
Headless coding tasks across repos — “fix the auth bug”, “add tests”, “refactor this module”. A fleet that accepts these, clones the right repo, runs Claude in a clean worktree, and opens a PR. → Fleet mode.
-
A customized Claude agent your other tools can talk to — a personal assistant, a domain expert, exposed as an OpenAI-compatible endpoint so LibreChat / OpenWebUI / LiteLLM / any OpenAI SDK client can drop it in as a “model”, with real-time streaming of the agent’s thinking, tool calls, and answers. → Agent mode.
Agenticore is one binary that does both. Profiles, hooks, MCP whitelists, Redis state, OTEL traces, Helm chart — all shared between the two modes.
The agent-mode killer feature: real-time, fully auditable streaming
In agent mode, agenticore exposes /v1/chat/completions with stream=true and pipes claude’s stdout directly through to the client as live OpenAI-format SSE deltas. Thinking blocks stream token-by-token as the model generates them. Tool calls and results stream live. Nothing is buffered to the end of the turn.
The streaming hot path runs claude --output-format stream-json --verbose --include-partial-messages, reads proc.stdout line-by-line, and dispatches each event into the appropriate SSE chunk shape. No transcript polling, no Redis indirection, no JSONL flush race.
Visibility is controlled by deterministic slash tokens stripped server-side before claude ever sees the prompt:
| Token | Effect |
|---|---|
/show-thinking / /hide-thinking | Toggle thinking visibility |
/show-tools / /hide-tools | Toggle tool_use + tool_result visibility |
/show-all / /hide-all | Toggle everything |
/stream-status | Return current config inline as a meta SSE event |
Toggles are sticky per agent in Redis (no TTL) with a file fallback, multi-turn aware, and toggle-only requests return inline meta SSE without spawning claude — zero token cost.
Every visible event reaches three observation surfaces simultaneously: (1) the wire (OpenAI SSE chunks), (2) claude’s transcript JSONL on disk, and (3) optionally the Redis bus for cross-process subscribers. Cross-validate all three with the bundled audit script tests/smoke/verify_streaming_pipeline.sh <agent>.
→ Full reference: SSE Streaming · Self-test walkthrough
Deploy anywhere
| Mode | When to use |
|---|---|
| Standalone | Development, single-machine workloads |
| Docker Compose | Self-hosted, single-host production |
| Kubernetes (Helm) | Multi-pod, autoscaling, shared repo cache, per-agent StatefulSets |
Getting Started
- Quickstart — Install, start the server, submit your first job
- Connecting Clients — MCP, REST, and CLI client setup
- Test Streaming — Port-forward an agent pod and watch thinking + tool calls stream live
Architecture
- Architecture Internals — Modules, data flow, Redis+file fallback, repo caching
- Dual Interface — MCP + REST ASGI routing and auth middleware
- Profile System — Directory-based profiles, agentihooks integration, materialization
- Job Execution — Runner pipeline, lifecycle state machine, auto-PR, OTEL
- Agent Mode — Package-based agents, completion API, SSE streaming pipeline
Deployment
- Docker Compose — Multi-service stack, volumes, networking
- Kubernetes — StatefulSet, shared RWX PVC, KEDA autoscaling, graceful drain
- OTEL Pipeline — Collector config, PostgreSQL sink, Langfuse traces
- Releases and CI/CD — Versioning, tests, linting, self-update
Reference
- SSE Streaming — Real-time thinking + tool deltas, slash token toggles, event schema, diagnostics, milestones
- API Reference — MCP tools + REST endpoints with schemas
- CLI Commands — All CLI subcommands with flags and examples
- Configuration — All env vars, YAML config, file paths