Agenticore Documentation

Two modes, one binary. Agenticore is a production-grade Claude Code runner that runs in two distinct shapes — switched at runtime by a single environment variable.

                          ┌─── AGENT_MODE=false (default) ────────────┐
                          │  FLEET MODE — Orchestrator                 │
                          │  Submit a task, get a PR                   │
                          │                                            │
   MCP / REST / CLI ─────►│  clone repo ──► bespoke worktree           │
                          │       │              │                     │
                          │       └──► claude -p "<task>" ──► auto-PR  │
                          │                              └──► OTEL     │
                          │  KEDA-scaled fleet • work-stealing queue   │
   ┌─────────────┐        └────────────────────────────────────────────┘
   │ agenticore  │
   │   binary    │
   └─────────────┘        ┌─── AGENT_MODE=true ────────────────────────┐
                          │  AGENT MODE — Customized agent endpoint    │
                          │  Drop-in OpenAI chat completion server     │
                          │                                            │
   OpenAI-compatible ────►│  load agent package (system prompt, MCP    │
   chat clients           │    servers, hooks, skills, identity)       │
   (LibreChat,            │                                            │
    OpenWebUI,            │  POST /v1/chat/completions stream=true     │
    LiteLLM,              │       │                                    │
    custom UI,            │       └─► live SSE deltas:                 │
    raw curl -N)          │            thinking_delta (token-by-token) │
                          │            tool_use + tool_result          │
                          │            assistant text                  │
                          │                                            │
                          │  Sticky slash toggles per agent            │
                          │  Fully auditable — wire/disk/Redis layers  │
                          └────────────────────────────────────────────┘

Pick a mode

  Fleet mode (default) Agent mode (AGENT_MODE=true)
What it does Accepts coding tasks, clones repos, runs Claude in worktrees, opens PRs Loads a customized Claude agent package and exposes it as a chat completion endpoint
API surface /jobs REST · run_task MCP · agenticore run CLI /v1/chat/completions — fully OpenAI-compatible, streaming and non-streaming
Lifecycle Per-job clone + worktree, discarded after PR Long-lived agent identity loaded once at startup
Output A pull request, an OTEL trace Live SSE deltas as chat.completion.chunk JSON, full transcript on disk
Drop-in for CI/CD pipelines, MCP-aware editors, “fix this” bots LibreChat, OpenWebUI, LiteLLM, any OpenAI SDK client

Both modes share the same binary, same Docker image, same Helm chart, same profile system, same Redis+file fallback, and same OTEL trace pipeline.


Why it matters

You have Claude Code. You want it to do work for you programmatically. You have two shapes the work tends to take:

  1. Headless coding tasks across repos — “fix the auth bug”, “add tests”, “refactor this module”. A fleet that accepts these, clones the right repo, runs Claude in a clean worktree, and opens a PR. → Fleet mode.

  2. A customized Claude agent your other tools can talk to — a personal assistant, a domain expert, exposed as an OpenAI-compatible endpoint so LibreChat / OpenWebUI / LiteLLM / any OpenAI SDK client can drop it in as a “model”, with real-time streaming of the agent’s thinking, tool calls, and answers. → Agent mode.

Agenticore is one binary that does both. Profiles, hooks, MCP whitelists, Redis state, OTEL traces, Helm chart — all shared between the two modes.


The agent-mode killer feature: real-time, fully auditable streaming

In agent mode, agenticore exposes /v1/chat/completions with stream=true and pipes claude’s stdout directly through to the client as live OpenAI-format SSE deltas. Thinking blocks stream token-by-token as the model generates them. Tool calls and results stream live. Nothing is buffered to the end of the turn.

The streaming hot path runs claude --output-format stream-json --verbose --include-partial-messages, reads proc.stdout line-by-line, and dispatches each event into the appropriate SSE chunk shape. No transcript polling, no Redis indirection, no JSONL flush race.

Visibility is controlled by deterministic slash tokens stripped server-side before claude ever sees the prompt:

Token Effect
/show-thinking / /hide-thinking Toggle thinking visibility
/show-tools / /hide-tools Toggle tool_use + tool_result visibility
/show-all / /hide-all Toggle everything
/stream-status Return current config inline as a meta SSE event

Toggles are sticky per agent in Redis (no TTL) with a file fallback, multi-turn aware, and toggle-only requests return inline meta SSE without spawning claude — zero token cost.

Every visible event reaches three observation surfaces simultaneously: (1) the wire (OpenAI SSE chunks), (2) claude’s transcript JSONL on disk, and (3) optionally the Redis bus for cross-process subscribers. Cross-validate all three with the bundled audit script tests/smoke/verify_streaming_pipeline.sh <agent>.

→ Full reference: SSE Streaming · Self-test walkthrough


Deploy anywhere

Mode When to use
Standalone Development, single-machine workloads
Docker Compose Self-hosted, single-host production
Kubernetes (Helm) Multi-pod, autoscaling, shared repo cache, per-agent StatefulSets

Getting Started

Architecture

  • Architecture Internals — Modules, data flow, Redis+file fallback, repo caching
  • Dual Interface — MCP + REST ASGI routing and auth middleware
  • Profile System — Directory-based profiles, agentihooks integration, materialization
  • Job Execution — Runner pipeline, lifecycle state machine, auto-PR, OTEL
  • Agent Mode — Package-based agents, completion API, SSE streaming pipeline

Deployment

Reference

  • SSE Streaming — Real-time thinking + tool deltas, slash token toggles, event schema, diagnostics, milestones
  • API Reference — MCP tools + REST endpoints with schemas
  • CLI Commands — All CLI subcommands with flags and examples
  • Configuration — All env vars, YAML config, file paths