AI Engineering Curriculum
Phase 3: Multi-Agent Systems·16 min read

Module 3.1

Multi-Agent Architecture Patterns

What Are Multi-Agent Architecture Patterns?

A multi-agent system is a collection of AI agents that collaborate to complete a task no single agent could handle alone. Each agent has a specialized role, its own context, and its own tools. They communicate, pass results, and coordinate to reach a shared goal.

Architecture patterns are proven blueprints for how those agents should be organized. Not the code — the structure. How they relate to each other. Who coordinates whom. How information flows. Whether agents work in sequence, in parallel, or in a dynamic network.

Think of it like organizational design for a company. Before you hire anyone, you decide: is this a flat team where everyone collaborates? A hierarchy with a manager? A pipeline where each person hands off to the next? The answer shapes everything — communication overhead, who can block whom, how fast work gets done, and what happens when something goes wrong.

The same decisions exist for multi-agent systems. And like organizational design, the wrong structure for the job causes dysfunction no matter how talented the individual agents are.


Real-World Use Cases

Before learning the patterns, it helps to see what these systems actually do in the real world:

LinkedIn — SQL Bot. Employees ask data questions in plain English. A multi-agent system understands the question, finds the right database tables, writes a SQL query, executes it, and if there's an error — diagnoses and fixes it, then retries. The retry loop is the key reason this needed a multi-agent architecture. A straight chain can't loop back on itself.

Uber — Code migration. Uber needed to migrate a massive codebase across their developer platform. They built a network of specialized agents where each one owns exactly one step: reading the old code, generating the migrated version, writing unit tests, validating correctness. A task that would take engineers weeks runs autonomously.

Elastic — Threat detection. When a security threat arrives, a routing agent decides which specialist agents to dispatch — one checks IP reputation, one analyzes log patterns, one queries threat databases. They all run simultaneously, and their findings are synthesized into a response. Parallel execution, centralized synthesis.

PwC — Code generation pipeline. A team of agents collaborates: one generates the code, another reviews it, another writes tests, another validates compliance. Code accuracy went from 10% to 70%.

DocuSign — Lead qualification. Multiple agents each extract information from different internal systems (CRM, marketing platform, product usage data), then a synthesis agent scores and qualifies the lead. No single agent has access to all the data — they work in parallel, each contributing their piece.

The pattern across all of these: the task had natural boundaries that genuinely required specialization, parallel work, or loops that a single agent can't do.


Key Terms to Know

Before the patterns, a quick glossary. These words will appear constantly:

Orchestrator — an agent whose job is to coordinate other agents. It breaks tasks down, routes work, and synthesizes results. It typically doesn't do the domain work itself.

Worker — a specialized agent that does the actual work. A research worker searches. A code worker writes code. A review worker checks quality.

Supervisor — similar to an orchestrator, but specifically implies a hierarchical authority relationship — the supervisor validates and approves worker outputs, not just routes them.

Handoff — one agent passing full control of a task to another. Like a relay baton — only one agent runs at a time.

State — the shared data object that agents read from and write to. It's the "working memory" of the entire system.

Reducer — a rule that controls how state updates merge when multiple agents write to the same field at the same time.

Termination condition — the rule that says "stop." Without one, agent loops run forever. Every loop must have one.

Context window — the maximum amount of text a model can "see" at once. In multi-agent systems, managing this carefully is critical — long-running agents accumulate context until they hit the ceiling.

Fan-out — splitting one task into multiple parallel tasks (one → many).

Fan-in — collecting parallel results and merging them (many → one).


When NOT to Use Multi-Agent

This is the most important section in the module, so it comes first.

Multi-agent systems are powerful — and they're also slower, more expensive, harder to debug, and fail in ways that single-agent systems don't. The failure rate of multi-agent systems in production is 41–86.7%. 40% of multi-agent pilots fail within 6 months of deployment. That's not a reason to avoid them — it's a reason to only use them when genuinely needed.

Always ask first: can a single agent with good tools solve this?

If yes — stop. Use a single agent. The overhead of coordination, additional latency, and new failure modes must be justified by genuine requirements.

Use multi-agent when:

  • The problem spans multiple domains that genuinely need specialization
  • Independent subtasks can be parallelized to meaningfully reduce time
  • Security boundaries must separate agents (a reviewer that literally cannot write files)
  • Context window limits require distributing memory across agents

Do not use multi-agent because it sounds more impressive. That's the most common reason people do it, and it produces complex systems that are harder to maintain and more likely to fail.


The 9 Core Patterns


Pattern 1: Sequential / Pipeline

What it is: Agents are chained in a fixed order. Each agent's output becomes the next agent's input. Nobody decides where to send their work — it's predetermined.

Input → [Agent A] → [Agent B] → [Agent C] → Output
              ↑           ↑           ↑
         adds value   adds value  adds value

The analogy: An assembly line. Each station does one thing and passes to the next. No station decides where the product goes next.

When to use:

  • Multi-stage processes with clear linear dependencies
  • Progressive refinement: draft → review → edit → publish
  • Each stage genuinely adds value the next stage depends on

When not to use:

  • Stages can run independently (use parallel instead)
  • A failure early on would invalidate all downstream work

Why it fails: If Agent A produces bad output, Agents B and C will dutifully process that bad output and produce their own polished bad output. Failures cascade. Validation between stages is not optional.

Real example: A law firm's contract pipeline — Template Selection Agent → Clause Customization Agent → Regulatory Compliance Agent → Risk Assessment Agent. Each stage requires the complete output of the prior stage.


Pattern 2: Concurrent / Parallel (Fan-Out / Fan-In)

What it is: Multiple agents receive the same input simultaneously, work independently, and their results are collected and synthesized by an aggregator.

              → [Agent A] →
Input →       → [Agent B] →   [Aggregator] → Output
              → [Agent C] →

The analogy: Asking three independent consultants to assess the same business problem simultaneously, then combining their reports.

When to use:

  • Multiple independent perspectives on the same input add value
  • Time-sensitive tasks where parallelism reduces latency
  • The analyses are genuinely independent (no agent needs another's result)

When not to use:

  • Agents need to build on each other's work
  • Results may contradict and you have no clear conflict resolution strategy
  • Resource costs make parallelism prohibitive

Why it fails: Without a good aggregation strategy, conflicting agent outputs create chaos. You need a principled way to reconcile disagreement — majority voting, a weighted merge, or a synthesizer agent.

Real example: Financial analysis — Fundamental Analysis Agent + Technical Analysis Agent + Sentiment Analysis Agent + ESG Agent all analyze the same stock simultaneously. Results synthesized into one recommendation. Research shows up to 81% performance improvement on parallel tasks compared to sequential.


Pattern 3: Orchestrator-Worker (Hub-and-Spoke)

What it is: A central orchestrator receives a task, dynamically decides how to break it down, routes subtasks to specialized workers, observes their results, and decides next steps. The problem structure emerges at runtime — the orchestrator adapts based on what comes back.

User → [Orchestrator]
           ↓ ↓ ↓  (dynamic routing decisions)
    [Worker A] [Worker B] [Worker C]
           ↓ ↓ ↓  (results returned)
       [Orchestrator synthesizes]
           ↓
         Output

The key distinction from Sequential: Sequential routing is predetermined. Orchestrator-Worker routing is decided dynamically by the orchestrator based on partial results. The orchestrator can call Worker B before Worker A if that's what the situation requires.

The analogy: A project manager who adapts the plan as work comes in, rather than following a fixed script.

When to use:

  • The problem structure isn't fully known upfront
  • Different subtasks need different specialists
  • The orchestrator needs to react to intermediate results

When not to use:

  • The task sequence is fully predictable from the start (Sequential is simpler)
  • High throughput is required (the orchestrator becomes a bottleneck)

Why it fails: The orchestrator's context window fills up as it receives results from every worker. Long-running systems need active context management or the orchestrator degrades.


Pattern 4: Supervisor / Hierarchical

What it is: A supervisor sits at the top of a tree. It delegates to workers (or sub-supervisors), monitors their outputs, validates completion, and synthesizes the final result. Multiple layers are possible.

        [Top Supervisor]
         ↓            ↓
[Supervisor A]    [Supervisor B]
   ↓      ↓          ↓      ↓
Worker  Worker    Worker  Worker

The key distinction from Orchestrator-Worker: Orchestrator-Worker is typically one level deep and routing-focused. Supervisor/Hierarchical explicitly supports multiple layers with quality assurance gates at each level — supervisors validate, not just route.

When to use:

  • Complex multi-domain workflows requiring auditability
  • Finance, healthcare, legal — where every decision needs a reviewable chain of responsibility
  • Enterprise workflows with clear departmental separation

When not to use:

  • Coordination overhead outweighs the benefit
  • Latency is critical (hierarchical routing adds hops at every level)

Why it fails: Every task must travel up and down the hierarchy. In a 3-level hierarchy, a simple question makes 6 trips before getting an answer. Hierarchy should only exist where it genuinely adds oversight value.


Pattern 5: Handoff / Routing / Triage

What it is: Agents dynamically transfer full control to another agent when they reach the limit of their expertise. Only one agent is active at a time. The sequence is not predetermined — it emerges from each agent's own self-assessment as they work.

[Triage Agent] → determines expertise needed
       ↓
[Specialist A] → hits its limit, hands off
       ↓
[Specialist B] → resolves or hands off further
       ↓
     Output

The analogy: A hospital triage system. You arrive, triage determines urgency, sends you to general intake, who sends you to a specialist, who may send you to a sub-specialist. Full context transfers with you at each step.

When to use:

  • The right specialist for a task isn't known upfront
  • Multiple specialized domains where only one is needed at a time
  • Clear signals exist for when an agent should hand off

When not to use:

  • The right agent sequence is predictable from the initial input
  • Risk of infinite handoff loops between agents is hard to prevent

Why it fails: Agents can ping-pong. Agent A decides it can't handle something and hands to Agent B. Agent B decides it's actually Agent A's domain and hands back. Without explicit loop prevention, this cycles forever.

Real example (Microsoft): Telecom customer service — Triage Agent handles common issues → hands off to Technical Infrastructure Agent for network problems → who may hand off to Financial Resolution Agent for billing disputes.


Pattern 6: Group Chat / Roundtable

What it is: Multiple agents participate in a shared conversation thread. A chat manager controls who speaks next. Agents read the accumulating thread and contribute from their specialization. The conversation continues until a termination condition is met.

[Chat Manager controls turn order]
        ↓
Agent A posts → Agent B responds → Agent C responds → Agent A revises → ... → consensus

The Maker-Checker variant: A Generator agent creates output → a Critic agent evaluates it → if rejected, Generator revises → repeat until approved. This is one of the most useful multi-agent patterns in practice.

When to use:

  • Decision-making that benefits from debate and multiple perspectives
  • Creative brainstorming
  • Quality assurance requiring iterative refinement
  • Human-in-the-loop scenarios (a human can participate in the conversation)

When not to use:

  • Real-time tasks where discussion overhead is unacceptable
  • More than ~3 agents (conversations become unmanageable)
  • No clear termination condition (it will loop forever without one)

Pattern 7: Swarm

What it is: Multiple peer agents work independently in parallel without any central coordinator. Each agent interacts directly with the environment. Collective behavior emerges from their independent actions — not from top-down direction.

The analogy: An ant colony. No ant is in charge. Each ant follows local rules and responds to local stimuli. The colony-level behavior — building a nest, finding food — emerges from millions of simple individual actions.

When to use:

  • High-coverage tasks where the search space is large (deep web research across many sources)
  • Fault tolerance matters more than strict consistency
  • Redundancy and diverse exploration are valuable

When not to use:

  • Token budget is tight — swarms burn tokens extremely fast
  • Strict consistency is required
  • You need predictable, auditable execution paths

Critical rule: Always pair a swarm with a deliberate aggregation agent. Swarms that run standalone without a consolidation phase produce volume, not answers.


Pattern 8: Magentic / Dynamic Orchestration

What it is: A manager agent builds and refines a task ledger as it discovers what needs to be done. Rather than executing a predetermined plan, the manager iterates — consulting agents, building a plan, refining it, and only then executing. The task list itself evolves as the work reveals new requirements.

When to use:

  • Genuinely open-ended problems with no predetermined solution path
  • Incident response, where the remediation plan must adapt as the situation evolves
  • Research planning, where what needs to be investigated changes based on early findings

When not to use:

  • The solution path is known — determinism is cheaper and faster
  • Time pressure is high (Magentic is slow — it deliberates before acting)
  • Total cost is critical (Magentic's cost is hard to predict)

Pattern 9: Blackboard

What it is: A shared knowledge space that all agents can read from and write to. No central coordinator assigns tasks — agents monitor the board and act when they see something they can contribute to.

The analogy: A whiteboard in a war room. Anyone can write on it, anyone can read it, and people step up to contribute what they know.

When to use:

  • Creative collaboration where specialists build on each other's partial contributions
  • Research synthesis where any agent might have relevant expertise at any point

Watch out for: Write contention — two agents writing the same field simultaneously. Requires serialization or reducers. Also, losing shared state is catastrophic because no individual agent has a backup.


How to Choose the Right Pattern

The single most important question: does the problem structure emerge at runtime, or is it known upfront?

Is the structure known upfront?
    YES → Is it linear?
            YES → Sequential
            NO  → Is it parallelizable?
                    YES → Concurrent
    NO  → Does one agent handle at a time?
            YES → Handoff/Routing
            NO  → Orchestrator-Worker
                  (or Magentic if truly open-ended)

Special cases:
  Need debate/consensus? → Group Chat
  Need high coverage, cost secondary? → Swarm
  Need multi-layer audit trail? → Supervisor/Hierarchical
  Specialists build on partial work? → Blackboard

The hybrid principle: Most real systems combine patterns. A typical production system might use Sequential for the main pipeline, Concurrent for the research phase within that pipeline, and a Maker-Checker loop for quality validation. Don't look for one pattern — look for the right pattern for each stage.


How Agents Communicate

Six ways information flows between agents:

1. Direct message passing — Agent A calls Agent B via an API, waits for a response. Simple, tightly coupled. Good for sequential patterns.

2. Shared state object — All agents read from and write to a common data structure (like LangGraph's TypedDict state). Each agent writes to its own keys to avoid conflicts.

3. Message queues (async) — Agents publish to queues (Kafka, Redis, RabbitMQ). Other agents subscribe and process independently. Decoupled, fault-tolerant, high-throughput.

4. Accumulating conversation thread — All agents see the growing message history (how AutoGen and Group Chat work). Context window fills up faster with this approach.

5. Tool calls / Agent-as-Tool — One agent invokes another via the tool interface. The parent agent calls the sub-agent as if it were a function and gets a result back.

6. Blackboard / shared memory — A public knowledge space agents post to and read from. No direct addressing between agents.


Trust, Security, and Permissions

Multi-agent systems introduce a critical security property that single-agent systems don't have: a compromised agent can propagate malicious instructions to the entire network.

If an agent reads a malicious web page that contains a prompt injection attack, and that agent passes its output to other agents, the attack can spread. This isn't theoretical — it's a real production concern.

The trust hierarchy to apply:

  • Instructions arriving via system prompt → high trust (you wrote them)
  • Outputs from other agents → medium trust (validate before acting on)
  • Content from external sources (web, files, APIs) → low trust (treat as potentially adversarial)

The principle of least privilege: Give each agent only the permissions it actually needs. A research agent doesn't need write access. A reviewer doesn't need database credentials. If an agent gets compromised, limited permissions limit the blast radius.

Structured schemas for agent-to-agent communication: When agents pass JSON instead of free text, format mismatches are caught early. When a planner outputs YAML but the executor expects JSON, you get a cascading failure. Use typed interfaces between agents, not natural language.


The 10 Failure Modes to Design Against

These aren't edge cases. They're the normal ways multi-agent systems fail in production.

1. Coordination failures (36.94% of production issues) — Agents misinterpret their roles, duplicate work, or produce conflicting outputs. Fix: structured JSON schemas for inter-agent communication, not natural language.

2. Cascading errors — A format mismatch in step 1 (YAML vs JSON) propagates and amplifies through every downstream agent. Fix: validate agent output before passing it to the next agent.

3. Infinite loops — No termination condition. Fix: every loop gets a hard maximum iteration count. No exceptions.

4. Context window exhaustion — Every agent interaction adds tokens. Long-running systems fill the window. Fix: summarize and compact between agent handoffs; pass only what the next agent actually needs.

5. Prompt injection propagation — Malicious content in external data hijacks one agent, which then passes bad instructions to others. Fix: treat external content as untrusted; validate at every boundary.

6. Parallel state conflicts — Two concurrent agents write to the same state field without a reducer. One overwrites the other nondeterministically. Fix: define reducers for any field that parallel agents might write.

7. Token cost explosion — A 4-agent crew using GPT-4 can cost 10-50x a single-agent approach. Fix: model routing — cheap models for simple tasks, expensive models only where needed.

8. Manager agent bottleneck — In hierarchical systems, every task routes through the supervisor. If the supervisor is slow or confused, the whole system stalls. Fix: use a capable model for supervisors; test supervisor routing logic in isolation.

9. Verification gaps — Nobody's watching what agents actually do. Silent failures accumulate. Fix: instrument everything; use LLM-as-judge evaluations (not exact-match assertions, which don't work for probabilistic outputs).

10. Information withholding — Agents fail to share important data or deviate from their objectives. Fix: structured output schemas; orchestrator-level validation of every output before passing downstream.


The 10 Framework-Agnostic Design Principles

These apply regardless of whether you use LangGraph, CrewAI, AutoGen, or anything else.

  1. Match architecture to task structure — the wrong pattern can reduce performance by 70% compared to the right one. Choose deliberately.
  2. Start with the lowest complexity that works — single agent first, multi-agent only when justified.
  3. Use structured schemas for inter-agent communication — JSON/Pydantic, not natural language.
  4. One resource owner per resource — each database, API, or file belongs to exactly one agent.
  5. Zero trust between agents — validate outputs at every boundary; don't blindly aggregate.
  6. Design for observability first — trace every agent operation from day one.
  7. Match model to role — don't use Opus for classification tasks; use Haiku.
  8. Explicit failure recovery — timeouts, retries, circuit breakers, max iterations on every loop.
  9. Context compaction between agents — pass cleaned outputs, not full reasoning histories.
  10. Patterns are composable — combine the right pattern for each stage of your system.

Sources