Basics of AI & Agents Curriculum
Phase 4: Multi-Agent SystemsΒ·20 min read

Module 4.1

Multi-Agent Architecture Patterns

What Are Multi-Agent Architecture Patterns?

A multi-agent system is a collection of AI agents that collaborate to complete a task no single agent could handle alone. Each agent has a specialized role, its own context, and its own tools. They communicate, pass results, and coordinate to reach a shared goal.

Architecture patterns are proven blueprints for how those agents should be organized. Not the code β€” the structure. How they relate to each other. Who coordinates whom. How information flows. Whether agents work in sequence, in parallel, or in a dynamic network.

Think of it like organizational design for a company. Before you hire anyone, you decide: is this a flat team where everyone collaborates? A hierarchy with a manager? A pipeline where each person hands off to the next? The answer shapes everything β€” communication overhead, who can block whom, how fast work gets done, and what happens when something goes wrong.

The same decisions exist for multi-agent systems. And like organizational design, the wrong structure for the job causes dysfunction no matter how talented the individual agents are.


Real-World Use Cases

Before learning the patterns, it helps to see what these systems actually do in the real world:

LinkedIn β€” SQL Bot. Employees ask data questions in plain English. A multi-agent system understands the question, finds the right database tables, writes a SQL query, executes it, and if there's an error β€” diagnoses and fixes it, then retries. The retry loop is the key reason this needed a multi-agent architecture. A straight chain can't loop back on itself.

Uber β€” Code migration. Uber needed to migrate a massive codebase across their developer platform. They built a network of specialized agents where each one owns exactly one step: reading the old code, generating the migrated version, writing unit tests, validating correctness. A task that would take engineers weeks runs autonomously.

Elastic β€” Threat detection. When a security threat arrives, a routing agent decides which specialist agents to dispatch β€” one checks IP reputation, one analyzes log patterns, one queries threat databases. They all run simultaneously, and their findings are synthesized into a response. Parallel execution, centralized synthesis.

PwC β€” Code generation pipeline. A team of agents collaborates: one generates the code, another reviews it, another writes tests, another validates compliance. Code accuracy went from 10% to 70%.

DocuSign β€” Lead qualification. Multiple agents each extract information from different internal systems (CRM, marketing platform, product usage data), then a synthesis agent scores and qualifies the lead. No single agent has access to all the data β€” they work in parallel, each contributing their piece.

The pattern across all of these: the task had natural boundaries that genuinely required specialization, parallel work, or loops that a single agent can't do.


Key Terms to Know

Before the patterns, a quick glossary. These words will appear constantly:

Orchestrator β€” an agent whose job is to coordinate other agents. It breaks tasks down, routes work, and synthesizes results. It typically doesn't do the domain work itself.

Worker β€” a specialized agent that does the actual work. A research worker searches. A code worker writes code. A review worker checks quality.

Supervisor β€” similar to an orchestrator, but specifically implies a hierarchical authority relationship β€” the supervisor validates and approves worker outputs, not just routes them.

Handoff β€” one agent passing full control of a task to another. Like a relay baton β€” only one agent runs at a time.

State β€” the shared data object that agents read from and write to. It's the "working memory" of the entire system.

Reducer β€” a rule that controls how state updates merge when multiple agents write to the same field at the same time.

Termination condition β€” the rule that says "stop." Without one, agent loops run forever. Every loop must have one.

Context window β€” the maximum amount of text a model can "see" at once. In multi-agent systems, managing this carefully is critical β€” long-running agents accumulate context until they hit the ceiling.

Fan-out β€” splitting one task into multiple parallel tasks (one β†’ many).

Fan-in β€” collecting parallel results and merging them (many β†’ one).


When NOT to Use Multi-Agent

This section is foundational, so it comes first.

Multi-agent systems are powerful β€” and they're also slower, more expensive, harder to debug, and fail in ways that single-agent systems don't. Research into production multi-agent systems has documented failure rates ranging from 41–86.7% depending on task complexity and configuration. Many early pilots encounter significant challenges within the first six months. That's not a reason to avoid them β€” it's a reason to only use them when genuinely needed.

Always ask first: can a single agent with good tools solve this?

If yes β€” stop. Use a single agent. The overhead of coordination, additional latency, and new failure modes must be justified by genuine requirements.

Use multi-agent when:

  • The problem spans multiple domains that genuinely need specialization
  • Independent subtasks can be parallelized to meaningfully reduce time
  • Security boundaries must separate agents (a reviewer that literally cannot write files)
  • Context window limits require distributing memory across agents

Avoid adding multi-agent complexity just because it sounds more capable. It's a common pattern, and the added complexity often makes systems harder to maintain and more likely to fail.


The 9 Core Patterns


Pattern 1: Sequential / Pipeline

What it is: Agents are chained in a fixed order. Each agent's output becomes the next agent's input. Nobody decides where to send their work β€” it's predetermined.

Input β†’ [Agent A] β†’ [Agent B] β†’ [Agent C] β†’ Output
              ↑           ↑           ↑
         adds value   adds value  adds value

The analogy: An assembly line. Each station does one thing and passes to the next. No station decides where the product goes next.

When to use:

  • Multi-stage processes with clear linear dependencies
  • Progressive refinement: draft β†’ review β†’ edit β†’ publish
  • Each stage genuinely adds value the next stage depends on

When not to use:

  • Stages can run independently (use parallel instead)
  • A failure early on would invalidate all downstream work

Why it fails: If Agent A produces bad output, Agents B and C will dutifully process that bad output and produce their own polished bad output. Failures cascade. Validation between stages is not optional.

Real example: A law firm's contract pipeline β€” Template Selection Agent β†’ Clause Customization Agent β†’ Regulatory Compliance Agent β†’ Risk Assessment Agent. Each stage requires the complete output of the prior stage.


Pattern 2: Concurrent / Parallel (Fan-Out / Fan-In)

What it is: Multiple agents receive the same input simultaneously, work independently, and their results are collected and synthesized by an aggregator.

              β†’ [Agent A] β†’
Input β†’       β†’ [Agent B] β†’   [Aggregator] β†’ Output
              β†’ [Agent C] β†’

The analogy: Asking three independent consultants to assess the same business problem simultaneously, then combining their reports.

When to use:

  • Multiple independent perspectives on the same input add value
  • Time-sensitive tasks where parallelism reduces latency
  • The analyses are genuinely independent (no agent needs another's result)

When not to use:

  • Agents need to build on each other's work
  • Results may contradict and you have no clear conflict resolution strategy
  • Resource costs make parallelism prohibitive

Why it fails: Without a good aggregation strategy, conflicting agent outputs create chaos. You need a principled way to reconcile disagreement β€” majority voting, a weighted merge, or a synthesizer agent.

Real example: Financial analysis β€” Fundamental Analysis Agent + Technical Analysis Agent + Sentiment Analysis Agent + ESG Agent all analyze the same stock simultaneously. Results synthesized into one recommendation. Research has shown up to 81% performance improvement on certain parallel tasks compared to sequential execution.


Pattern 3: Orchestrator-Worker (Hub-and-Spoke)

What it is: A central orchestrator receives a task, dynamically decides how to break it down, routes subtasks to specialized workers, observes their results, and decides next steps. The problem structure emerges at runtime β€” the orchestrator adapts based on what comes back.

User β†’ [Orchestrator]
           ↓ ↓ ↓  (dynamic routing decisions)
    [Worker A] [Worker B] [Worker C]
           ↓ ↓ ↓  (results returned)
       [Orchestrator synthesizes]
           ↓
         Output

The key distinction from Sequential: Sequential routing is predetermined. Orchestrator-Worker routing is decided dynamically by the orchestrator based on partial results. The orchestrator can call Worker B before Worker A if that's what the situation requires.

The analogy: A project manager who adapts the plan as work comes in, rather than following a fixed script.

When to use:

  • The problem structure isn't fully known upfront
  • Different subtasks need different specialists
  • The orchestrator needs to react to intermediate results

When not to use:

  • The task sequence is fully predictable from the start (Sequential is simpler)
  • High throughput is required (the orchestrator becomes a bottleneck)

Why it fails: The orchestrator's context window fills up as it receives results from every worker. Long-running systems need active context management or the orchestrator degrades.


Pattern 4: Supervisor / Hierarchical

What it is: A supervisor sits at the top of a tree. It delegates to workers (or sub-supervisors), monitors their outputs, validates completion, and synthesizes the final result. Multiple layers are possible.

        [Top Supervisor]
         ↓            ↓
[Supervisor A]    [Supervisor B]
   ↓      ↓          ↓      ↓
Worker  Worker    Worker  Worker

The key distinction from Orchestrator-Worker: Orchestrator-Worker is typically one level deep and routing-focused. Supervisor/Hierarchical explicitly supports multiple layers with quality assurance gates at each level β€” supervisors validate, not just route.

When to use:

  • Complex multi-domain workflows requiring auditability
  • Finance, healthcare, legal β€” where every decision needs a reviewable chain of responsibility
  • Enterprise workflows with clear departmental separation

When not to use:

  • Coordination overhead outweighs the benefit
  • Latency is critical (hierarchical routing adds hops at every level)

Why it fails: Every task must travel up and down the hierarchy. In a 3-level hierarchy, a simple question makes 6 trips before getting an answer. Hierarchy should only exist where it genuinely adds oversight value.


Pattern 5: Handoff / Routing / Triage

What it is: Agents dynamically transfer full control to another agent when they reach the limit of their expertise. Only one agent is active at a time. The sequence is not predetermined β€” it emerges from each agent's own self-assessment as they work.

[Triage Agent] β†’ determines expertise needed
       ↓
[Specialist A] β†’ hits its limit, hands off
       ↓
[Specialist B] β†’ resolves or hands off further
       ↓
     Output

The analogy: A hospital triage system. You arrive, triage determines urgency, sends you to general intake, who sends you to a specialist, who may send you to a sub-specialist. Full context transfers with you at each step.

When to use:

  • The right specialist for a task isn't known upfront
  • Multiple specialized domains where only one is needed at a time
  • Clear signals exist for when an agent should hand off

When not to use:

  • The right agent sequence is predictable from the initial input
  • Risk of infinite handoff loops between agents is hard to prevent

Why it fails: Agents can ping-pong. Agent A decides it can't handle something and hands to Agent B. Agent B decides it's actually Agent A's domain and hands back. Without explicit loop prevention, this cycles forever.

Real example (Microsoft): Telecom customer service β€” Triage Agent handles common issues β†’ hands off to Technical Infrastructure Agent for network problems β†’ who may hand off to Financial Resolution Agent for billing disputes.


Pattern 6: Group Chat / Roundtable

What it is: Multiple agents participate in a shared conversation thread. A chat manager controls who speaks next. Agents read the accumulating thread and contribute from their specialization. The conversation continues until a termination condition is met.

[Chat Manager controls turn order]
        ↓
Agent A posts β†’ Agent B responds β†’ Agent C responds β†’ Agent A revises β†’ ... β†’ consensus

The Maker-Checker variant: A Generator agent creates output β†’ a Critic agent evaluates it β†’ if rejected, Generator revises β†’ repeat until approved. This is one of the most useful multi-agent patterns in practice.

When to use:

  • Decision-making that benefits from debate and multiple perspectives
  • Creative brainstorming
  • Quality assurance requiring iterative refinement
  • Human-in-the-loop scenarios (a human can participate in the conversation)

When not to use:

  • Real-time tasks where discussion overhead is unacceptable
  • More than ~3 agents (conversations become unmanageable)
  • No clear termination condition (it will loop forever without one)

Pattern 7: Swarm

What it is: Multiple peer agents work independently in parallel without any central coordinator. Each agent interacts directly with the environment. Collective behavior emerges from their independent actions β€” not from top-down direction.

The analogy: An ant colony. No ant is in charge. Each ant follows local rules and responds to local stimuli. The colony-level behavior β€” building a nest, finding food β€” emerges from millions of simple individual actions.

When to use:

  • High-coverage tasks where the search space is large (deep web research across many sources)
  • Fault tolerance matters more than strict consistency
  • Redundancy and diverse exploration are valuable

When not to use:

  • Token budget is tight β€” swarms burn tokens extremely fast
  • Strict consistency is required
  • You need predictable, auditable execution paths

Critical rule: Always pair a swarm with a deliberate aggregation agent. Swarms that run standalone without a consolidation phase produce volume, not answers.


Pattern 8: Magentic / Dynamic Orchestration

What it is: A manager agent builds and refines a task ledger as it discovers what needs to be done. Rather than executing a predetermined plan, the manager iterates β€” consulting agents, building a plan, refining it, and only then executing. The task list itself evolves as the work reveals new requirements.

When to use:

  • Genuinely open-ended problems with no predetermined solution path
  • Incident response, where the remediation plan must adapt as the situation evolves
  • Research planning, where what needs to be investigated changes based on early findings

When not to use:

  • The solution path is known β€” determinism is cheaper and faster
  • Time pressure is high (Magentic is slow β€” it deliberates before acting)
  • Total cost is critical (Magentic's cost is hard to predict)

Pattern 9: Blackboard

What it is: A shared knowledge space that all agents can read from and write to. No central coordinator assigns tasks β€” agents monitor the board and act when they see something they can contribute to.

The analogy: A whiteboard in a war room. Anyone can write on it, anyone can read it, and people step up to contribute what they know.

When to use:

  • Creative collaboration where specialists build on each other's partial contributions
  • Research synthesis where any agent might have relevant expertise at any point

Watch out for: Write contention β€” two agents writing the same field simultaneously. Requires serialization or reducers. Also, losing shared state is catastrophic because no individual agent has a backup.


How to Choose the Right Pattern

The single most important question: does the problem structure emerge at runtime, or is it known upfront?

Is the structure known upfront?
    YES β†’ Is it linear?
            YES β†’ Sequential
            NO  β†’ Is it parallelizable?
                    YES β†’ Concurrent
    NO  β†’ Does one agent handle at a time?
            YES β†’ Handoff/Routing
            NO  β†’ Orchestrator-Worker
                  (or Magentic if truly open-ended)

Special cases:
  Need debate/consensus? β†’ Group Chat
  Need high coverage, cost secondary? β†’ Swarm
  Need multi-layer audit trail? β†’ Supervisor/Hierarchical
  Specialists build on partial work? β†’ Blackboard

The hybrid principle: Most real systems combine patterns. A typical production system might use Sequential for the main pipeline, Concurrent for the research phase within that pipeline, and a Maker-Checker loop for quality validation. Don't look for one pattern β€” look for the right pattern for each stage.


How Agents Communicate

Six ways information flows between agents:

1. Direct message passing β€” Agent A calls Agent B via an API, waits for a response. Simple, tightly coupled. Good for sequential patterns.

2. Shared state object β€” All agents read from and write to a common data structure (like LangGraph's TypedDict state). Each agent writes to its own keys to avoid conflicts.

3. Message queues (async) β€” Agents publish to queues (Kafka, Redis, RabbitMQ). Other agents subscribe and process independently. Decoupled, fault-tolerant, high-throughput.

4. Accumulating conversation thread β€” All agents see the growing message history (how AutoGen and Group Chat work). Context window fills up faster with this approach.

5. Tool calls / Agent-as-Tool β€” One agent invokes another via the tool interface. The parent agent calls the sub-agent as if it were a function and gets a result back.

6. Blackboard / shared memory β€” A public knowledge space agents post to and read from. No direct addressing between agents.


Trust, Security, and Permissions

Multi-agent systems introduce a critical security property that single-agent systems don't have: a compromised agent can propagate malicious instructions to the entire network.

If an agent reads a malicious web page that contains a prompt injection attack, and that agent passes its output to other agents, the attack can spread. This isn't theoretical β€” it's a real production concern.

The trust hierarchy to apply:

  • Instructions arriving via system prompt β†’ high trust (you wrote them)
  • Outputs from other agents β†’ medium trust (validate before acting on)
  • Content from external sources (web, files, APIs) β†’ low trust (treat as potentially adversarial)

The principle of least privilege: Give each agent only the permissions it actually needs. A research agent doesn't need write access. A reviewer doesn't need database credentials. If an agent gets compromised, limited permissions limit the blast radius.

Structured schemas for agent-to-agent communication: When agents pass JSON instead of free text, format mismatches are caught early. When a planner outputs YAML but the executor expects JSON, you get a cascading failure. Use typed interfaces between agents, not natural language.


The 10 Failure Modes to Design Against

These aren't edge cases. They're the normal ways multi-agent systems fail in production.

1. Coordination failures (36.94% of production issues) β€” Agents misinterpret their roles, duplicate work, or produce conflicting outputs. Fix: structured JSON schemas for inter-agent communication, not natural language.

2. Cascading errors β€” A format mismatch in step 1 (YAML vs JSON) propagates and amplifies through every downstream agent. Fix: validate agent output before passing it to the next agent.

3. Infinite loops β€” No termination condition. Fix: every loop gets a hard maximum iteration count. No exceptions.

4. Context window exhaustion β€” Every agent interaction adds tokens. Long-running systems fill the window. Fix: summarize and compact between agent handoffs; pass only what the next agent actually needs.

5. Prompt injection propagation β€” Malicious content in external data hijacks one agent, which then passes bad instructions to others. Fix: treat external content as untrusted; validate at every boundary.

6. Parallel state conflicts β€” Two concurrent agents write to the same state field without a reducer. One overwrites the other nondeterministically. Fix: define reducers for any field that parallel agents might write.

7. Token cost explosion β€” A 4-agent crew using GPT-4 can cost 10-50x a single-agent approach. Fix: model routing β€” cheap models for simple tasks, expensive models only where needed.

8. Manager agent bottleneck β€” In hierarchical systems, every task routes through the supervisor. If the supervisor is slow or confused, the whole system stalls. Fix: use a capable model for supervisors; test supervisor routing logic in isolation.

9. Verification gaps β€” Nobody's watching what agents actually do. Silent failures accumulate. Fix: instrument everything; use LLM-as-judge evaluations (not exact-match assertions, which don't work for probabilistic outputs).

10. Information withholding β€” Agents fail to share important data or deviate from their objectives. Fix: structured output schemas; orchestrator-level validation of every output before passing downstream.


The 10 Framework-Agnostic Design Principles

These apply regardless of whether you use LangGraph, CrewAI, AutoGen, or anything else.

  1. Match architecture to task structure β€” the wrong pattern can reduce performance by 70% compared to the right one. Choose deliberately.
  2. Start with the lowest complexity that works β€” single agent first, multi-agent only when justified.
  3. Use structured schemas for inter-agent communication β€” JSON/Pydantic, not natural language.
  4. One resource owner per resource β€” each database, API, or file belongs to exactly one agent.
  5. Zero trust between agents β€” validate outputs at every boundary; don't blindly aggregate.
  6. Design for observability first β€” trace every agent operation from day one.
  7. Match model to role β€” don't use Opus for classification tasks; use Haiku.
  8. Explicit failure recovery β€” timeouts, retries, circuit breakers, max iterations on every loop.
  9. Context compaction between agents β€” pass cleaned outputs, not full reasoning histories.
  10. Patterns are composable β€” combine the right pattern for each stage of your system.

Sources