Module 3.1
Multi-Agent Architecture Patterns
What Are Multi-Agent Architecture Patterns?
A multi-agent system is a collection of AI agents that collaborate to complete a task no single agent could handle alone. Each agent has a specialized role, its own context, and its own tools. They communicate, pass results, and coordinate to reach a shared goal.
Architecture patterns are proven blueprints for how those agents should be organized. Not the code — the structure. How they relate to each other. Who coordinates whom. How information flows. Whether agents work in sequence, in parallel, or in a dynamic network.
Think of it like organizational design for a company. Before you hire anyone, you decide: is this a flat team where everyone collaborates? A hierarchy with a manager? A pipeline where each person hands off to the next? The answer shapes everything — communication overhead, who can block whom, how fast work gets done, and what happens when something goes wrong.
The same decisions exist for multi-agent systems. And like organizational design, the wrong structure for the job causes dysfunction no matter how talented the individual agents are.
Real-World Use Cases
Before learning the patterns, it helps to see what these systems actually do in the real world:
LinkedIn — SQL Bot. Employees ask data questions in plain English. A multi-agent system understands the question, finds the right database tables, writes a SQL query, executes it, and if there's an error — diagnoses and fixes it, then retries. The retry loop is the key reason this needed a multi-agent architecture. A straight chain can't loop back on itself.
Uber — Code migration. Uber needed to migrate a massive codebase across their developer platform. They built a network of specialized agents where each one owns exactly one step: reading the old code, generating the migrated version, writing unit tests, validating correctness. A task that would take engineers weeks runs autonomously.
Elastic — Threat detection. When a security threat arrives, a routing agent decides which specialist agents to dispatch — one checks IP reputation, one analyzes log patterns, one queries threat databases. They all run simultaneously, and their findings are synthesized into a response. Parallel execution, centralized synthesis.
PwC — Code generation pipeline. A team of agents collaborates: one generates the code, another reviews it, another writes tests, another validates compliance. Code accuracy went from 10% to 70%.
DocuSign — Lead qualification. Multiple agents each extract information from different internal systems (CRM, marketing platform, product usage data), then a synthesis agent scores and qualifies the lead. No single agent has access to all the data — they work in parallel, each contributing their piece.
The pattern across all of these: the task had natural boundaries that genuinely required specialization, parallel work, or loops that a single agent can't do.
Key Terms to Know
Before the patterns, a quick glossary. These words will appear constantly:
Orchestrator — an agent whose job is to coordinate other agents. It breaks tasks down, routes work, and synthesizes results. It typically doesn't do the domain work itself.
Worker — a specialized agent that does the actual work. A research worker searches. A code worker writes code. A review worker checks quality.
Supervisor — similar to an orchestrator, but specifically implies a hierarchical authority relationship — the supervisor validates and approves worker outputs, not just routes them.
Handoff — one agent passing full control of a task to another. Like a relay baton — only one agent runs at a time.
State — the shared data object that agents read from and write to. It's the "working memory" of the entire system.
Reducer — a rule that controls how state updates merge when multiple agents write to the same field at the same time.
Termination condition — the rule that says "stop." Without one, agent loops run forever. Every loop must have one.
Context window — the maximum amount of text a model can "see" at once. In multi-agent systems, managing this carefully is critical — long-running agents accumulate context until they hit the ceiling.
Fan-out — splitting one task into multiple parallel tasks (one → many).
Fan-in — collecting parallel results and merging them (many → one).
When NOT to Use Multi-Agent
This is the most important section in the module, so it comes first.
Multi-agent systems are powerful — and they're also slower, more expensive, harder to debug, and fail in ways that single-agent systems don't. The failure rate of multi-agent systems in production is 41–86.7%. 40% of multi-agent pilots fail within 6 months of deployment. That's not a reason to avoid them — it's a reason to only use them when genuinely needed.
Always ask first: can a single agent with good tools solve this?
If yes — stop. Use a single agent. The overhead of coordination, additional latency, and new failure modes must be justified by genuine requirements.
Use multi-agent when:
- The problem spans multiple domains that genuinely need specialization
- Independent subtasks can be parallelized to meaningfully reduce time
- Security boundaries must separate agents (a reviewer that literally cannot write files)
- Context window limits require distributing memory across agents
Do not use multi-agent because it sounds more impressive. That's the most common reason people do it, and it produces complex systems that are harder to maintain and more likely to fail.
The 9 Core Patterns
Pattern 1: Sequential / Pipeline
What it is: Agents are chained in a fixed order. Each agent's output becomes the next agent's input. Nobody decides where to send their work — it's predetermined.
Input → [Agent A] → [Agent B] → [Agent C] → Output
↑ ↑ ↑
adds value adds value adds value
The analogy: An assembly line. Each station does one thing and passes to the next. No station decides where the product goes next.
When to use:
- Multi-stage processes with clear linear dependencies
- Progressive refinement: draft → review → edit → publish
- Each stage genuinely adds value the next stage depends on
When not to use:
- Stages can run independently (use parallel instead)
- A failure early on would invalidate all downstream work
Why it fails: If Agent A produces bad output, Agents B and C will dutifully process that bad output and produce their own polished bad output. Failures cascade. Validation between stages is not optional.
Real example: A law firm's contract pipeline — Template Selection Agent → Clause Customization Agent → Regulatory Compliance Agent → Risk Assessment Agent. Each stage requires the complete output of the prior stage.
Pattern 2: Concurrent / Parallel (Fan-Out / Fan-In)
What it is: Multiple agents receive the same input simultaneously, work independently, and their results are collected and synthesized by an aggregator.
→ [Agent A] →
Input → → [Agent B] → [Aggregator] → Output
→ [Agent C] →
The analogy: Asking three independent consultants to assess the same business problem simultaneously, then combining their reports.
When to use:
- Multiple independent perspectives on the same input add value
- Time-sensitive tasks where parallelism reduces latency
- The analyses are genuinely independent (no agent needs another's result)
When not to use:
- Agents need to build on each other's work
- Results may contradict and you have no clear conflict resolution strategy
- Resource costs make parallelism prohibitive
Why it fails: Without a good aggregation strategy, conflicting agent outputs create chaos. You need a principled way to reconcile disagreement — majority voting, a weighted merge, or a synthesizer agent.
Real example: Financial analysis — Fundamental Analysis Agent + Technical Analysis Agent + Sentiment Analysis Agent + ESG Agent all analyze the same stock simultaneously. Results synthesized into one recommendation. Research shows up to 81% performance improvement on parallel tasks compared to sequential.
Pattern 3: Orchestrator-Worker (Hub-and-Spoke)
What it is: A central orchestrator receives a task, dynamically decides how to break it down, routes subtasks to specialized workers, observes their results, and decides next steps. The problem structure emerges at runtime — the orchestrator adapts based on what comes back.
User → [Orchestrator]
↓ ↓ ↓ (dynamic routing decisions)
[Worker A] [Worker B] [Worker C]
↓ ↓ ↓ (results returned)
[Orchestrator synthesizes]
↓
Output
The key distinction from Sequential: Sequential routing is predetermined. Orchestrator-Worker routing is decided dynamically by the orchestrator based on partial results. The orchestrator can call Worker B before Worker A if that's what the situation requires.
The analogy: A project manager who adapts the plan as work comes in, rather than following a fixed script.
When to use:
- The problem structure isn't fully known upfront
- Different subtasks need different specialists
- The orchestrator needs to react to intermediate results
When not to use:
- The task sequence is fully predictable from the start (Sequential is simpler)
- High throughput is required (the orchestrator becomes a bottleneck)
Why it fails: The orchestrator's context window fills up as it receives results from every worker. Long-running systems need active context management or the orchestrator degrades.
Pattern 4: Supervisor / Hierarchical
What it is: A supervisor sits at the top of a tree. It delegates to workers (or sub-supervisors), monitors their outputs, validates completion, and synthesizes the final result. Multiple layers are possible.
[Top Supervisor]
↓ ↓
[Supervisor A] [Supervisor B]
↓ ↓ ↓ ↓
Worker Worker Worker Worker
The key distinction from Orchestrator-Worker: Orchestrator-Worker is typically one level deep and routing-focused. Supervisor/Hierarchical explicitly supports multiple layers with quality assurance gates at each level — supervisors validate, not just route.
When to use:
- Complex multi-domain workflows requiring auditability
- Finance, healthcare, legal — where every decision needs a reviewable chain of responsibility
- Enterprise workflows with clear departmental separation
When not to use:
- Coordination overhead outweighs the benefit
- Latency is critical (hierarchical routing adds hops at every level)
Why it fails: Every task must travel up and down the hierarchy. In a 3-level hierarchy, a simple question makes 6 trips before getting an answer. Hierarchy should only exist where it genuinely adds oversight value.
Pattern 5: Handoff / Routing / Triage
What it is: Agents dynamically transfer full control to another agent when they reach the limit of their expertise. Only one agent is active at a time. The sequence is not predetermined — it emerges from each agent's own self-assessment as they work.
[Triage Agent] → determines expertise needed
↓
[Specialist A] → hits its limit, hands off
↓
[Specialist B] → resolves or hands off further
↓
Output
The analogy: A hospital triage system. You arrive, triage determines urgency, sends you to general intake, who sends you to a specialist, who may send you to a sub-specialist. Full context transfers with you at each step.
When to use:
- The right specialist for a task isn't known upfront
- Multiple specialized domains where only one is needed at a time
- Clear signals exist for when an agent should hand off
When not to use:
- The right agent sequence is predictable from the initial input
- Risk of infinite handoff loops between agents is hard to prevent
Why it fails: Agents can ping-pong. Agent A decides it can't handle something and hands to Agent B. Agent B decides it's actually Agent A's domain and hands back. Without explicit loop prevention, this cycles forever.
Real example (Microsoft): Telecom customer service — Triage Agent handles common issues → hands off to Technical Infrastructure Agent for network problems → who may hand off to Financial Resolution Agent for billing disputes.
Pattern 6: Group Chat / Roundtable
What it is: Multiple agents participate in a shared conversation thread. A chat manager controls who speaks next. Agents read the accumulating thread and contribute from their specialization. The conversation continues until a termination condition is met.
[Chat Manager controls turn order]
↓
Agent A posts → Agent B responds → Agent C responds → Agent A revises → ... → consensus
The Maker-Checker variant: A Generator agent creates output → a Critic agent evaluates it → if rejected, Generator revises → repeat until approved. This is one of the most useful multi-agent patterns in practice.
When to use:
- Decision-making that benefits from debate and multiple perspectives
- Creative brainstorming
- Quality assurance requiring iterative refinement
- Human-in-the-loop scenarios (a human can participate in the conversation)
When not to use:
- Real-time tasks where discussion overhead is unacceptable
- More than ~3 agents (conversations become unmanageable)
- No clear termination condition (it will loop forever without one)
Pattern 7: Swarm
What it is: Multiple peer agents work independently in parallel without any central coordinator. Each agent interacts directly with the environment. Collective behavior emerges from their independent actions — not from top-down direction.
The analogy: An ant colony. No ant is in charge. Each ant follows local rules and responds to local stimuli. The colony-level behavior — building a nest, finding food — emerges from millions of simple individual actions.
When to use:
- High-coverage tasks where the search space is large (deep web research across many sources)
- Fault tolerance matters more than strict consistency
- Redundancy and diverse exploration are valuable
When not to use:
- Token budget is tight — swarms burn tokens extremely fast
- Strict consistency is required
- You need predictable, auditable execution paths
Critical rule: Always pair a swarm with a deliberate aggregation agent. Swarms that run standalone without a consolidation phase produce volume, not answers.
Pattern 8: Magentic / Dynamic Orchestration
What it is: A manager agent builds and refines a task ledger as it discovers what needs to be done. Rather than executing a predetermined plan, the manager iterates — consulting agents, building a plan, refining it, and only then executing. The task list itself evolves as the work reveals new requirements.
When to use:
- Genuinely open-ended problems with no predetermined solution path
- Incident response, where the remediation plan must adapt as the situation evolves
- Research planning, where what needs to be investigated changes based on early findings
When not to use:
- The solution path is known — determinism is cheaper and faster
- Time pressure is high (Magentic is slow — it deliberates before acting)
- Total cost is critical (Magentic's cost is hard to predict)
Pattern 9: Blackboard
What it is: A shared knowledge space that all agents can read from and write to. No central coordinator assigns tasks — agents monitor the board and act when they see something they can contribute to.
The analogy: A whiteboard in a war room. Anyone can write on it, anyone can read it, and people step up to contribute what they know.
When to use:
- Creative collaboration where specialists build on each other's partial contributions
- Research synthesis where any agent might have relevant expertise at any point
Watch out for: Write contention — two agents writing the same field simultaneously. Requires serialization or reducers. Also, losing shared state is catastrophic because no individual agent has a backup.
How to Choose the Right Pattern
The single most important question: does the problem structure emerge at runtime, or is it known upfront?
Is the structure known upfront?
YES → Is it linear?
YES → Sequential
NO → Is it parallelizable?
YES → Concurrent
NO → Does one agent handle at a time?
YES → Handoff/Routing
NO → Orchestrator-Worker
(or Magentic if truly open-ended)
Special cases:
Need debate/consensus? → Group Chat
Need high coverage, cost secondary? → Swarm
Need multi-layer audit trail? → Supervisor/Hierarchical
Specialists build on partial work? → Blackboard
The hybrid principle: Most real systems combine patterns. A typical production system might use Sequential for the main pipeline, Concurrent for the research phase within that pipeline, and a Maker-Checker loop for quality validation. Don't look for one pattern — look for the right pattern for each stage.
How Agents Communicate
Six ways information flows between agents:
1. Direct message passing — Agent A calls Agent B via an API, waits for a response. Simple, tightly coupled. Good for sequential patterns.
2. Shared state object — All agents read from and write to a common data structure (like LangGraph's TypedDict state). Each agent writes to its own keys to avoid conflicts.
3. Message queues (async) — Agents publish to queues (Kafka, Redis, RabbitMQ). Other agents subscribe and process independently. Decoupled, fault-tolerant, high-throughput.
4. Accumulating conversation thread — All agents see the growing message history (how AutoGen and Group Chat work). Context window fills up faster with this approach.
5. Tool calls / Agent-as-Tool — One agent invokes another via the tool interface. The parent agent calls the sub-agent as if it were a function and gets a result back.
6. Blackboard / shared memory — A public knowledge space agents post to and read from. No direct addressing between agents.
Trust, Security, and Permissions
Multi-agent systems introduce a critical security property that single-agent systems don't have: a compromised agent can propagate malicious instructions to the entire network.
If an agent reads a malicious web page that contains a prompt injection attack, and that agent passes its output to other agents, the attack can spread. This isn't theoretical — it's a real production concern.
The trust hierarchy to apply:
- Instructions arriving via system prompt → high trust (you wrote them)
- Outputs from other agents → medium trust (validate before acting on)
- Content from external sources (web, files, APIs) → low trust (treat as potentially adversarial)
The principle of least privilege: Give each agent only the permissions it actually needs. A research agent doesn't need write access. A reviewer doesn't need database credentials. If an agent gets compromised, limited permissions limit the blast radius.
Structured schemas for agent-to-agent communication: When agents pass JSON instead of free text, format mismatches are caught early. When a planner outputs YAML but the executor expects JSON, you get a cascading failure. Use typed interfaces between agents, not natural language.
The 10 Failure Modes to Design Against
These aren't edge cases. They're the normal ways multi-agent systems fail in production.
1. Coordination failures (36.94% of production issues) — Agents misinterpret their roles, duplicate work, or produce conflicting outputs. Fix: structured JSON schemas for inter-agent communication, not natural language.
2. Cascading errors — A format mismatch in step 1 (YAML vs JSON) propagates and amplifies through every downstream agent. Fix: validate agent output before passing it to the next agent.
3. Infinite loops — No termination condition. Fix: every loop gets a hard maximum iteration count. No exceptions.
4. Context window exhaustion — Every agent interaction adds tokens. Long-running systems fill the window. Fix: summarize and compact between agent handoffs; pass only what the next agent actually needs.
5. Prompt injection propagation — Malicious content in external data hijacks one agent, which then passes bad instructions to others. Fix: treat external content as untrusted; validate at every boundary.
6. Parallel state conflicts — Two concurrent agents write to the same state field without a reducer. One overwrites the other nondeterministically. Fix: define reducers for any field that parallel agents might write.
7. Token cost explosion — A 4-agent crew using GPT-4 can cost 10-50x a single-agent approach. Fix: model routing — cheap models for simple tasks, expensive models only where needed.
8. Manager agent bottleneck — In hierarchical systems, every task routes through the supervisor. If the supervisor is slow or confused, the whole system stalls. Fix: use a capable model for supervisors; test supervisor routing logic in isolation.
9. Verification gaps — Nobody's watching what agents actually do. Silent failures accumulate. Fix: instrument everything; use LLM-as-judge evaluations (not exact-match assertions, which don't work for probabilistic outputs).
10. Information withholding — Agents fail to share important data or deviate from their objectives. Fix: structured output schemas; orchestrator-level validation of every output before passing downstream.
The 10 Framework-Agnostic Design Principles
These apply regardless of whether you use LangGraph, CrewAI, AutoGen, or anything else.
- Match architecture to task structure — the wrong pattern can reduce performance by 70% compared to the right one. Choose deliberately.
- Start with the lowest complexity that works — single agent first, multi-agent only when justified.
- Use structured schemas for inter-agent communication — JSON/Pydantic, not natural language.
- One resource owner per resource — each database, API, or file belongs to exactly one agent.
- Zero trust between agents — validate outputs at every boundary; don't blindly aggregate.
- Design for observability first — trace every agent operation from day one.
- Match model to role — don't use Opus for classification tasks; use Haiku.
- Explicit failure recovery — timeouts, retries, circuit breakers, max iterations on every loop.
- Context compaction between agents — pass cleaned outputs, not full reasoning histories.
- Patterns are composable — combine the right pattern for each stage of your system.
Sources
- Anthropic Engineering — Building Effective Agents
- AI Agent Orchestration Patterns — Azure Architecture Center (Microsoft, Feb 2026)
- Designing Effective Multi-Agent Architectures — O'Reilly Radar
- Multi-Agent Collaboration Patterns — AWS Prescriptive Guidance
- Why Do Multi-Agent LLM Systems Fail? — arxiv 2503.13657
- AI Agent Architecture Patterns — Redis Engineering Blog
- Developer's Guide to Multi-Agent Patterns — Google ADK