Multi-Agent Orchestration | AI Engineering Curriculum

What Is Multi-Agent Orchestration?

Multi-agent orchestration is running more than one specialized AI agent under one roof — each with its own identity, role, and permissions — and routing work between them deliberately.

You've already seen how to define multiple agents in config. But defining them is the easy part. Orchestration is about how they work together: how the right task reaches the right agent, how one agent hands off to another, how you build a system where each agent has a narrow job it does exceptionally well.

The mental model: think of a company org chart. A general assistant (your orchestrator) takes everything that comes in, figures out what kind of work it is, and routes it to the right specialist. The code agent handles anything involving files and terminals. The research agent handles anything involving the web. The admin agent handles calendar and email. The orchestrator synthesizes and delivers results. Nobody does each other's job.

This isn't a built-in OpenClaw feature you flip on. It's a design pattern you build with OpenClaw's routing, binding, and spawning tools. The framework gives you the primitives; you design the organization.

Real-World Use Cases

People running multi-agent OpenClaw setups today:

Work + Personal split — two agents, two separate contexts, two channel sets. Work conversations never contaminate personal memory. The simplest and most immediately useful multi-agent setup.
Orchestrator + specialists — a general agent handles all inbound, classifies requests, and delegates to a code agent, a research agent, or an admin agent. Results come back and get synthesized.
Main + Monitor — a main agent handles interactive conversations; a monitor agent runs on heartbeat, sweeps for anomalies, checks health, and reports proactively.
Client-facing + internal — a public agent (locked down, sandboxed) handles client queries; a private agent with full access does the actual work behind the scenes.
Overnight workforce — an orchestrator receives a task list before bed, breaks tasks into subtasks, spawns specialist sub-agents for each, and synthesizes results by morning.

Key Terms

Binding — a routing rule that maps inbound messages to a specific agent. Bindings are evaluated at message receipt: most specific rule wins.

sessions_spawn — an OpenClaw tool that creates a new, isolated sub-agent session. The spawning agent provides a prompt; the sub-agent runs it in a blank context and (optionally) delivers results back.

Orchestrator — the agent that receives all inbound requests, classifies them, and delegates to the right specialist. The "manager" in the org chart.

Specialist — an agent with a narrow, specific job. Tight SOUL.md, restricted tools, single-domain focus.

Agent team — a coordinated group of agents working on a shared problem. As of early 2026, this is a community design pattern built from primitives, not a native OpenClaw feature (it's on the roadmap).

How Routing Works: Bindings

Every inbound message arrives at the Gateway with three pieces of identity: which channel it came from, which account on that channel, and who sent it (the peer). Bindings match on these fields to decide which agent handles the message.

The evaluation is deterministic and most-specific-wins:

Exact peer match (specific DM or group)
Parent peer match (thread inheritance)
Discord role + guild match
Channel or team-level match
Fallback to the default agent

The first rule that matches wins. Everything else is ignored.

The simplest binding setup — route entire channels to different agents:

json5

{
  agents: {
    list: [
      {
        id: "work",
        identity: { name: "Clive", emoji: "💼" },
        bindings: [
          { channel: "slack" },
          { channel: "discord" }
        ]
      },
      {
        id: "personal",
        identity: { name: "Rex", emoji: "🏠" },
        bindings: [
          { channel: "telegram" },
          { channel: "whatsapp" }
        ]
      }
    ]
  }
}

Slack and Discord → work agent. Telegram and WhatsApp → personal agent. Messages route automatically. Contexts stay completely separate.

Finer-grained routing — specific groups to specific agents:

json5

{
  agents: {
    list: [
      {
        id: "code",
        identity: { name: "Code Agent", emoji: "💻" },
        bindings: [
          // Only this specific Telegram group goes to the code agent
          { channel: "telegram", peer: "-1001234567890" }
        ]
      },
      {
        id: "main",
        identity: { name: "Clive", emoji: "🧠" },
        // No bindings = catches everything not matched above (default)
      }
    ]
  }
}

Messages to the code-review Telegram group go to the code agent. Everything else — all other Telegram DMs, all other groups — falls through to the main agent.

The binding rule: bindings evaluate once at message receipt. You can't reroute mid-conversation. If a message matched the wrong agent, the conversation continues with that agent for its duration.

sessions_spawn — Delegating a Task

sessions_spawn is how one agent creates a sub-agent to handle a bounded task. It's the primitive that makes orchestration possible.

When an agent calls sessions_spawn, it creates a new isolated session with:

A blank context (no history from the parent)
A starting prompt provided by the parent
Its own agent config (can be the same agent or a different one)
An optional delivery target for results

Think of it exactly like a manager assigning a project to a team member: "Here's everything you need to know about this task. Go do it. Report back." The team member works in their own workspace, with their own focus, completely independently.

The critical constraint that breaks most first attempts: the sub-agent starts with zero context. It knows nothing about your ongoing conversation, your preferences, your project names, or anything discussed previously. Every single piece of context the sub-agent needs must be in the spawning prompt. Every time. No exceptions.

This feels repetitive when you first design it. It's actually a feature — it makes each sub-agent run reproducible and debuggable. You can read the spawn prompt and know exactly what the sub-agent was given.

Teaching your orchestrator to delegate via SOUL.md:

markdown

## Delegation Rules

### Code Tasks
When asked to review code, run tests, modify files, or execute shell commands:
1. Extract the full context needed: file paths, what to check, success criteria
2. Spawn a code-agent sub-session with ALL of that context
3. Wait for the result and summarize it clearly for the user
4. Do not attempt code tasks yourself — delegate every time

### Research Tasks
When asked to research a topic or find information online:
1. Define the exact question, desired output format, and where to save results
2. Spawn a research-agent sub-session with those specifications
3. Deliver the synthesized findings

### Rule
Never spawn a sub-agent with a vague prompt.
"Research AI trends" is not acceptable.
"Search for the top 5 papers on RAG published in 2025, extract key findings,
and save a bullet-point summary to ~/research/rag-2025.md" is acceptable.

The Agent Team Pattern

Power users building "AI workforces" use this org-chart structure:

You (via Telegram/WhatsApp)
        ↓
  ORCHESTRATOR AGENT
  ├── Receives all inbound requests
  ├── Classifies: code / research / admin / personal
  ├── Spawns appropriate specialist via sessions_spawn
  └── Synthesizes results, delivers to you
          ↓
    SPECIALIST AGENTS (spawned as isolated sessions)
    ├── Code Agent     — file ops, shell, git, tests
    ├── Research Agent — web search, synthesis, note-saving
    ├── Admin Agent    — calendar, email, Todoist
    └── Monitor Agent  — heartbeat sweep, health checks

Each specialist is configured with:

A tight SOUL.md that defines exactly one role:

markdown

# Research Agent

You are a specialized research agent. Your only job is to search for information,
synthesize it accurately, and save it to the specified file location.
You do not answer questions. You do not chat. You research and write.

## Output Requirements
Always save results to the file path specified in your instructions.
Always cite your sources.
Always write a 2-sentence summary at the top of every file.
If you can't find reliable information, write "INSUFFICIENT SOURCES" and stop.

Only the tools it needs:

json5

{
  id: "researcher",
  tools: {
    allow: ["web_search", "web_fetch", "read", "write"],
    deny: ["exec", "browser", "gateway", "cron", "sessions_spawn"]
  },
  sandbox: {
    mode: "all",
    scope: "session",
    workspaceAccess: "rw"  // needs to write research files
  }
}

json5

{
  id: "coder",
  tools: {
    allow: ["read", "write", "exec"],
    deny: ["web_search", "web_fetch", "gateway", "cron", "sessions_spawn"]
  },
  sandbox: {
    mode: "all",
    scope: "session",
    workspaceAccess: "rw"
  }
}

The researcher can browse the web but can't run shell commands. The coder can run shell commands but can't browse the web. Neither can spawn further sub-agents or modify gateway config. Each one has exactly what it needs for its job and nothing more.

Inter-Agent Communication (Current Workarounds)

Native agent-to-agent messaging ("Agent Teams") is on the OpenClaw roadmap but not in stable release as of early 2026. What power users actually use:

Method 1: Shared workspace files (most reliable)

Sub-agents write results to a shared directory. The orchestrator reads from it on heartbeat or after spawning completes.

~/.openclaw/
└── shared/
    ├── research-queue.md      ← orchestrator writes tasks here
    ├── research-results/      ← researcher writes results here
    │   └── rag-2025.md
    └── code-results/          ← coder writes results here
        └── test-report.md

The orchestrator's SOUL.md instructs it to check ~/shared/research-results/ after spawning a research task. The researcher writes to that directory. Simple, reliable, auditable — you can read the files yourself to verify what happened.

Method 2: Channel relay

Sub-agent delivers results to a specific Telegram group or channel topic. The orchestrator's binding includes that topic. It picks up the message and processes it.

json5

// Research agent's cron job delivers to a dedicated results topic
openclaw cron add \
  --name "Research delivery" \
  --session isolated \
  --agent researcher \
  --message "Complete research task and deliver results." \
  --announce --channel telegram --to "-1001234567890:topic:42"

The orchestrator is bound to that topic and sees the delivery. It picks it up, synthesizes, and sends to the user's main channel. More complex to set up, but useful when results need to flow through the messaging layer.

Method 3: Webhook relay

Sub-agent delivers results via webhook to a small local HTTP server. That server injects the result as a system event into the orchestrator's main session.

The most flexible approach. Requires running a small webhook receiver (a few lines of Python with FastAPI). Good for production setups where you want fine-grained control over the handoff.

For most personal setups: use the shared workspace file method. It's transparent, debuggable, and doesn't require extra infrastructure.

Common Multi-Agent Configurations

Configuration 1: The Minimal (2 agents)

The setup that immediately pays off for almost everyone:

json5

{
  agents: {
    list: [
      {
        id: "work",
        identity: { name: "Clive", emoji: "💼" },
        bindings: [{ channel: "slack" }],
        model: { primary: "anthropic/claude-opus-4-6" }
        // SOUL.md: knows your projects, clients, team, work context
      },
      {
        id: "personal",
        identity: { name: "Rex", emoji: "🏠" },
        bindings: [{ channel: "telegram" }, { channel: "whatsapp" }],
        model: { primary: "anthropic/claude-sonnet-4-20250514" }
        // SOUL.md: knows your personal habits, routines, preferences
      }
    ]
  }
}

Work conversations stay in the work agent. Personal conversations stay in the personal agent. The work agent never knows about your evening plans. The personal agent never knows about client projects. Both accumulate focused, clean context over time. This alone makes OpenClaw dramatically more useful than a single generic agent.

Configuration 2: The Trio (3 agents)

Add a monitor agent that runs proactively:

json5

{
  agents: {
    list: [
      { id: "main", /* ... interactive conversations ... */ },
      { id: "personal", /* ... personal channel ... */ },
      {
        id: "monitor",
        identity: { name: "Monitor", emoji: "👁️" },
        // No bindings — nobody messages this agent directly
        // It only runs via heartbeat and cron
        tools: {
          allow: ["read", "web_fetch"],
          deny: ["write", "exec", "gateway", "cron"]
        }
      }
    ]
  }
}

The monitor agent's SOUL.md instructs it to: check task queue on heartbeat, flag anything overdue, check for new GitHub issues, verify the health ping was sent. It never initiates conversation — it runs on a schedule and delivers reports.

Configuration 3: The Workforce (N agents)

Each specialist is a separate agent. The orchestrator's SOUL.md contains a full routing guide:

markdown

## Routing Guide

**Code tasks** (file edits, running tests, git operations, shell commands):
→ Spawn code-agent session with full file/command context

**Research tasks** (web search, topic synthesis, competitive analysis):
→ Spawn research-agent session with explicit question + output format + save path

**Admin tasks** (calendar, email drafts, Todoist, scheduling):
→ Spawn admin-agent session with specific action and any relevant contacts/context

**Personal tasks** (habits, personal decisions, reflections):
→ Handle in this session — personal context lives here

**Unclear requests**:
→ Ask a clarifying question before delegating — never guess the category

The workforce pattern scales well in principle. In practice, every additional agent adds: more API costs per heartbeat, more config to maintain, more SOUL.md files to keep updated, and more complexity to debug when something goes wrong. Start with the minimal setup. Add agents as you prove the need.

The Context Problem (And How to Solve It)

The hardest thing about multi-agent OpenClaw is context. Each agent has its own isolated session history. They don't share memory. If you tell the work agent something important, the personal agent doesn't know. If the research agent discovers something, the orchestrator doesn't automatically see it.

You have to design around this. Three strategies:

1. Write everything important to shared files.

Make it a rule: anything that should survive across agents gets written to a file in a shared workspace directory. Your orchestrator's SOUL.md should instruct it to check specific files on startup. Your specialist agents should always write their outputs to that shared directory.

markdown

# Shared workspace structure
~/shared/
├── context.md        ← persistent facts about me and my work
├── projects/         ← project-specific notes
├── research/         ← research outputs from researcher agent
└── tasks/
    ├── queue.md      ← tasks waiting to be done
    └── completed.md  ← tasks done with results

2. Pass context explicitly in spawn prompts.

When spawning a sub-agent, include everything it needs. Don't assume it knows anything. A good spawn prompt looks like:

You are helping Tommy with a research task.

CONTEXT:
- Tommy is building an AI consulting practice
- He's currently evaluating three tools: LangGraph, CrewAI, and OpenClaw
- He wants to understand the differences in production reliability

TASK:
Search for recent (2025-2026) production case studies comparing LangGraph
and CrewAI. Find at least 3 real examples with concrete metrics.
Save a structured comparison to ~/shared/research/langgraph-vs-crewai.md.

OUTPUT FORMAT:
## [Tool Name] vs [Tool Name]: Production Comparison
| Metric | LangGraph | CrewAI |
| ... | ... | ... |

Sources: [list all]

3. Use a single shared context.md file as agent memory.

The simplest shared memory system: a Markdown file that gets updated whenever anything important happens. All agents' SOUL.md files instruct them to read it at session start and update it when they learn something worth keeping.

markdown

# context.md
_Last updated: 2026-02-19 by work-agent_

## Active Projects
- ProductX pricing research (in progress, research-agent working)
- Client proposal for Acme Corp (draft in ~/drafts/acme-proposal.md)

## Recent Decisions
- Decided to use LangGraph over CrewAI for the client pipeline
- Moving hosting from AWS to Hetzner in Q2

## My Preferences
- Prefer bullet points in reports
- Always cite sources
- Send status updates to Telegram, not WhatsApp

This file is the closest thing to shared memory across agents. It's low-tech, transparent, and entirely under your control.

Security in Multi-Agent Setups

Multi-agent setups amplify both capability and risk. A few specific things to lock down:

Never give sessions_spawn to agents that process external content. If a research agent that reads web pages can also spawn sub-agents, a prompt injection that hijacks the research agent can spawn additional agents with expanded permissions. Keep sessions_spawn only on the orchestrator, and only if the orchestrator never reads untrusted content directly.

Each agent should have the minimum tools for its role. Don't give the research agent exec permissions "just in case." Don't give the admin agent access to web_fetch. Each agent's tool profile should describe exactly and only what its job requires.

Shared workspace files can be injection vectors. If the researcher writes to ~/shared/research/output.md and the orchestrator reads that file, a prompt injection in a web page that the researcher fetched could embed instructions in the output file that then hijack the orchestrator when it reads them. Mitigate with the reader-agent pattern: have the orchestrator summarize research files rather than read them raw. Or restrict the orchestrator from auto-acting on file contents without user review.

Gotchas

Sub-agents forget everything — every run. The most common mistake. You spawn a sub-agent to do follow-up research on something discussed yesterday. It has no idea what you're referring to. You must pass the full context in the spawn prompt. Design your orchestrator's delegation instructions to always extract and pass the relevant context before spawning.

More agents = more API costs, compounding. If you have 5 agents and each heartbeats every 30 minutes, you have 10 heartbeat turns per hour. At Opus pricing, that adds up. Right-size agent counts and heartbeat intervals to actual need. Most monitors can heartbeat every 60 minutes. Not everything needs Opus.

Bindings are static. You can't dynamically reroute a conversation mid-stream. If someone messages the wrong channel, that conversation is with the wrong agent for its duration. Design your channel-to-agent mapping thoughtfully upfront — it's not easy to change after people are using it.

Sessions_spawn is not native inter-agent communication. The parent spawns a child, the child runs to completion, results are delivered or saved. There's no back-and-forth between running agents. If your design requires agents to negotiate or coordinate in real-time, you need the webhook relay approach — or wait for Agent Teams to land in a stable release.

Don't build the workforce before proving the minimal setup. Two agents (work + personal) that work reliably is worth more than five agents where you're not sure which one handled what. Add agents when you have a clear, proven need for specialization. Never add agents for hypothetical future use cases.