Axocoatl: multi-agent coordination without the orchestrator

Every multi-agent framework puts the same component at the centre: a scheduler that decides which agent runs next. LangGraph compiles your graph and walks it. CrewAI routes everything through a process — sequential, or a manager agent. Anthropic’s research system uses a lead agent that dispatches subagents. The frameworks disagree about nearly everything else, but somewhere in each of them, something is in charge.

Axocoatl deletes that component. It’s a Rust runtime for persistent AI agents whose coordination model is borrowed from ant colonies: agents don’t take orders from an orchestrator, they activate when signals accumulate in a shared environment. The project calls this a “stigmergic event lattice”. Whether that’s a real architectural idea or a dependency graph in an ant costume is the most interesting question about the project — and since nobody outside it has written a word about it yet, it seems worth a proper look.

One week public, two months old

Axocoatl — “The Rust runtime for self-coordinating multi-agent systems” — appeared on GitHub on 4 June 2026. As I write this the repo is one week old, with 61 stars, 135 commits, and three releases, the latest (v0.1.2) published today. The history goes back a bit further: the changelog dates v0.1.0 to late April, notes a rename from “Nexus”, and the author, Erick Echeverria, reserved the axocoatl crate name on crates.io on 1 April. Call it two months of private development and one week in public.

There is no third-party coverage anywhere — no Hacker News submission, no Reddit thread, no blog posts that I could find. The repo even contains a launch/ directory with drafts of its own HN and Reddit announcements, so the silence looks deliberate and probably temporary.

The shape of the thing: a Cargo workspace of around twenty crates built on ractor, an Erlang-inspired actor library for Rust, with tokio underneath, axum for the HTTP API, rmcp for MCP, and wasmtime for sandboxed tool execution. Apache 2.0, zero telemetry, local-first. Six LLM providers — Ollama, OpenAI, Anthropic, Mistral, Gemini, OpenRouter — plus, since v0.1.2, any OpenAI-compatible base_url, which covers LM Studio, vLLM and MLX. No database required: by default, agent state is plain files on disk and embeddings run locally. The marketing site claims the whole runtime is one 25 MB binary, set against “100s of MB + a venv”.

The event lattice

Here’s a complete two-agent system from the repo’s starter config:

yaml

agents:
  - id: researcher
    name: "Researcher"
    provider: ollama
    model: llama3.2
    system_prompt: "You research a question and return 3 bullet-pointed facts."
    depends_on: []
    token_budget: { per_execution: 2000, per_call: 1000, overflow_policy: warn }

  - id: summarizer
    name: "Summarizer"
    provider: ollama
    model: llama3.2
    system_prompt: "You take research bullets and write a single 2-sentence summary."
    depends_on: [researcher]
    token_budget: { per_execution: 2000, per_call: 1000, overflow_policy: warn }

workflows:
  - id: hello-world
    name: "Hello world"
    description: "Research → summarize. The simplest possible cascade."
    agents: [researcher, summarizer]
    entry_point: researcher

No edges, no graph-builder API, no orchestrator definition. The depends_on field is the entire coordination surface. At runtime the daemon registers each agent in an EventLattice with an activation threshold of N × 0.5, where N is its number of dependencies. When an agent finishes, it publishes a TaskCompleted event into the lattice, worth a signal strength of 0.5. When an agent’s accumulated signal crosses its threshold, it wakes up and receives the upstream outputs as context. The architecture doc is blunt about the consequence: “There is no scheduler. Coordination emerges from events.”

Different events carry different weights — from lattice.rs: TaskAvailable and UserInput deposit 1.0, AgentFailed 0.8, TaskCompleted 0.5, ToolResult 0.3, WorkflowCompleted and AgentActivated 0.1. Signals decay exponentially over time (I(t) = I₀ × e^(-λt) — actual pheromone maths), and an agent’s accumulated intensity resets to zero when it fires. A cycle guard caps a run at agents × 3 activations, and configs are validated as acyclic, so a feedback loop can’t melt your GPU.

A second layer, skills, declares event types directly:

yaml

skills:
  - id: quick-research
    name: "Quick research"
    description: "Fire a one-shot research request into the lattice."
    emits: ["ResearchRequested"]
    reacts_to: []
    agents: [researcher]
    prompt: "Research this topic in 3 bullets."

The same machinery drives proactive agents: because activation is just “signal crossed threshold”, an agent can react to a schedule tick or an AgentFailed event with no user in the loop. That’s where this stops being a workflow engine and starts being a runtime.

Kill the daemon mid-task

Each agent runs as a ractor actor and checkpoints its state to disk after every turn, keeping the last three. A supervision loop in the daemon watches agent liveness; if an agent panics, or its provider returns an unrecoverable error, it gets restarted from its latest checkpoint, restoring its session, token usage, and behaviour state. The README’s demo is exactly what you’d hope for: kill the server mid-task, restart, and the agent resumes from its checkpoint rather than from zero.

The design note that suggests someone thought this through: “A checkpoint is a regenerable cache, never a source of truth.” Corrupt checkpoints are discarded with a warning instead of blocking startup. Anyone who has watched a system refuse to boot over one bad cache file will appreciate that sentence.

Four tiers of memory

Tier	What	Persistence
1 — Session	conversation transcript	in-memory
2 — Checkpoint	agent state snapshots	disk (pruned to 3)
3 — Core memory	agent-edited curated blocks	disk (JSON, per-agent + shared)
4 — Semantic	neural vector recall	disk (embeddings)

Tier 4 needs no external services. Embeddings come from all-MiniLM-L6-v2 (384 dimensions) running on Candle, HuggingFace’s pure-Rust ML framework; the ~90 MB model downloads once and inference is local. No Pinecone, no API key for memory.

Tier 3 is the v0.1.2 headline, and the architecture doc credits the design honestly: it’s the MemGPT/Letta model. Agents edit their own curated memory blocks — persona, human, project by default — through core_memory_append, core_memory_replace, and core_memory_set tools. Blocks have character limits and can be marked shared: true for cross-agent team memory. A background “sleep-time consolidation” pass promotes durable facts from semantic memory into core blocks while agents are idle, which is roughly what your hippocampus does overnight.

Recall is hybrid. Top-k semantic hits are injected passively each turn (defaults: top_k: 5, min_score: 0.15), and agents get explicit tools on top: recall_search for semantic queries, recall_timeframe for “what happened on 9 June”.

Budgets enforced before the call

rust

pub struct TokenBudget {
    pub per_call: usize,
    pub per_execution: usize,
    pub overflow_policy: OverflowPolicy, // Abort (default) or Warn
}

Agents carry optional per-call and per-execution token budgets, and as of v0.1.2 a configured budget is enforced by default — overflow aborts rather than warns. The line from the architecture doc worth quoting: “Budgets are checked before the LLM call, so an over-budget request never costs tokens.” Metering after the fact gives you a report; checking before the call gives you a budget. axocoatl tokens report — or GET /api/tokens/report — breaks usage down per agent.

Budgets also feed back into coordination. An agent with role: coordinator runs hierarchical task network (HTN) decomposition over a compound task, then assigns the pieces by auction: workers bid, and bid scoring weighs tool-capability match against remaining token budget. An agent close to its spend cap loses auctions to one with headroom. Whether HTN-plus-auctions is more machinery than a v0.1 project needs is a fair question, but budget-aware agent selection is not something I’ve seen elsewhere.

Does stigmergy actually solve anything?

The multi-agent argument of the past year, compressed: in June 2025 Cognition published “Don’t Build Multi-Agents”, arguing that parallel agents make conflicting implicit decisions and fragility compounds. The same week, Anthropic described its production research system, in which a lead agent dispatching subagents beat single-agent Claude Opus 4 by 90.2% on an internal eval — while noting that multi-agent systems burn about 15× more tokens than ordinary chat. By April 2026 Cognition had softened its position: narrow multi-agent patterns work, provided writes stay single-threaded.

The academic record points at where these systems break. “Why Do Multi-Agent LLM Systems Fail?” — the MAST taxonomy — annotated over 1,600 execution traces across seven frameworks and found that roughly a third of failures were inter-agent misalignment: agents misunderstanding, duplicating, or undoing each other’s work. More interesting for Axocoatl, a 2025 paper on blackboard-style coordination — no task assignment at all; requests broadcast to a shared space, agents self-select — reported 13–57% relative improvement in end-to-end success over the best baseline on data-discovery benchmarks. Environment-mediated coordination has real research behind it and almost no production implementations. Axocoatl is one of the first attempts at the latter.

So where does it land? An honest reading: for a plain depends_on workflow, the pheromone arithmetic collapses to dependency counting. The threshold is N × 0.5 and each dependency’s completion is worth exactly 0.5, so the agent fires when all its dependencies are done — a topological sort in ant clothes, implementable with an integer counter. The stigmergic framing earns its keep only past the DAG: differently-weighted events (a failure shouts louder than a tool result), decay (stale signals fade), and proactive agents that react to anything in the event stream rather than to slots in a pre-compiled graph.

The strongest objection comes from a Hacker News thread on the Cognition piece, where user ramchip pointed out that the Erlang comparison everyone reaches for is misleading: supervisors isolate failures, they don’t decompose tasks or combine results. That applies squarely here. Axocoatl’s supervision-and-checkpoint layer convincingly answers “will my agents survive a crash”. It cannot answer “will five cheap agents produce something better than one good one” — no runtime can, because that’s a property of how the work is sliced. MAST’s biggest failure bucket, system design issues at roughly 44%, lands on whoever writes the YAML. That’s still you.

Should you try it?

Things to know first.

It’s a week old in public. v0.1.x is churning fast: v0.1.2 reworked the entire Tier-3 memory model, and v0.1.1 fixed tool calling that was substantially broken in v0.1.0 — only Ollama even received tool definitions, and multi-turn tool conversations failed on every provider. A normal trajectory for new software, but it tells you where on the curve you are.

Docs lag the code. The deployed memory documentation still describes the pre-0.1.2 key-value design; the in-repo ARCHITECTURE.md is the current source of truth.

Adoption is a rounding error. crates.io downloads are in the hundreds. If you adopt this, you are the community.

If none of that puts you off:

bash

curl -fsSL https://raw.githubusercontent.com/axocoatl/axocoatl/main/scripts/install.sh | sh
# or: cargo install axocoatl-cli  (needs Rust 1.82+)

axocoatl onboard   # interactive wizard: pick a provider, scaffold a project
axocoatl doctor    # environment check: config, Ollama, data dir, daemon
axocoatl dev       # start the daemon + dashboard
axocoatl chat -a assistant

The fit is specific: persistent agents on your own hardware, talking to Ollama, surviving reboots, no telemetry, no Python environment. For a homelab box that’s an attractive package — one binary, local embeddings included.

I don’t know whether stigmergic coordination beats an orchestrator. Neither does anyone else — the comparative evidence doesn’t exist yet, and Axocoatl is too young to generate it. What it offers in the meantime is solid engineering — supervised restarts, pre-flight budgets, local-first memory — wrapped around a coordination idea the research community is only starting to take seriously. Worth an hour and a git clone, if only to watch the lattice pulse.