Putting It All Together — Designing a Real Multi-Agent System

What This Module Is

You've learned the patterns (3.1), the frameworks (3.2–3.4), and how to operate them (3.5). This module shows how all of it connects when you actually sit down to design something real.

Most tutorials show you code first and explain later. This module does the opposite. We'll walk through the design of a complete production multi-agent system — an AI Research Team — making every decision step by step, explaining the reasoning behind each one, before showing a single line of code.

By the end, you won't just know how to copy a multi-agent system — you'll know how to design one.

The System We're Designing

AI Research Team: a multi-agent system that takes any research topic and produces a comprehensive, sourced report autonomously.

Input: a research topic or question
Output: a structured report with findings, analysis, and cited sources
Constraint: must be production-ready — checkpointed, observable, cost-controlled, with human review before publishing

This is the kind of system that's useful for consulting work (client research), product work (market analysis), and personal productivity. It's also a clean vehicle for applying every concept from Phase 3.

Step 1: Problem Analysis — Do We Even Need Multi-Agent?

Before picking a pattern, always ask the first question: can a single agent with good tools solve this?

For a research task: load one agent with web search, a PDF reader, and a writing tool. Give it a well-crafted system prompt. Ask for a report.

That will work for simple topics. It fails at scale for three reasons:

Context window. A single agent doing deep research accumulates enormous context — hundreds of search results, scraped pages, drafts, revisions. It hits the ceiling.
Specialization. A single agent trying to simultaneously be a researcher, fact-checker, and writer is doing three different cognitive jobs. Models perform better with focused roles.
Parallelism. Web research and academic research can happen simultaneously. Sequential search doubles the latency for no reason.

Multi-agent is justified here. Next question: what shape should it take?

Step 2: Identifying Agent Boundaries

Draw the task on paper. What are the natural phases?

1. Research: gather raw information from multiple sources
2. Fact check: verify key claims
3. Synthesis: structure findings into coherent insights
4. Writing: produce the final polished report
5. Review: human approves before publishing

Which of these are genuinely independent? Research (web) and research (academic) can run simultaneously — neither depends on the other. Fact-checking depends on research being done first. Synthesis depends on both research and fact-checking. Writing depends on synthesis. Review waits for writing.

This gives us a natural agent structure:

[Web Researcher]  ←── run in parallel ──→  [Academic Researcher]
         ↓                                           ↓
             [Fact Checker] ← waits for both ─────→
                     ↓
               [Synthesizer]
                     ↓
                 [Writer]
                     ↓
            [Human Review Gate]
                     ↓
                 [Published]

Step 3: Pattern Selection — Which Architecture?

Looking at the structure above, which pattern from Module 3.1 fits?

Sequential? No — the research phase can be parallel. Sequential would waste time.
Concurrent? Partially — but the phases after research are sequential and dependent.
Orchestrator-Worker? Close — but we have a clear structure with quality validation that needs supervision.
Supervisor? Yes — a supervisor that routes to parallel researchers, then passes synthesized results through the pipeline, with a human checkpoint at the end.

Decision: Supervisor pattern + Parallel execution for the research phase + Human-in-the-loop before publication.

This is the hybrid pattern principle from Module 3.1: use the right pattern for each stage, not one pattern for everything.

Step 4: Framework Selection — LangGraph or CrewAI?

Ask the key question: does the workflow structure emerge at runtime, or is it mostly known upfront?

The structure is mostly known: research in parallel → fact check → synthesize → write → human review. The only dynamic part is the supervisor deciding when research is complete enough.

This could work in CrewAI with a hierarchical process. But three things push toward LangGraph:

Human-in-the-loop. We need to pause before publishing and wait for a human to review. LangGraph's interrupt pattern handles this natively. CrewAI's human_input=True is less controllable.
Parallel execution. The Send API in LangGraph gives us true dynamic parallelism. CrewAI's parallelism is more limited.
Checkpointing. A research run that takes 5 minutes can't afford to start over if something fails. LangGraph's PostgresSaver checkpoints after every node.

Decision: LangGraph with langgraph-supervisor.

Step 5: State Design

What does shared state need to hold?

Python

from typing import Annotated, TypedDict
from langgraph.graph import MessagesState
from operator import add

class ResearchState(MessagesState):
    # The research topic
    topic: str

    # Accumulated findings from parallel researchers
    # add reducer = results from both researchers get appended, not overwritten
    raw_findings: Annotated[list[str], add]

    # Fact check result
    verified_claims: list[str]

    # Synthesized findings
    synthesis: str

    # Final report
    report: str

    # Quality score from the reflection loop
    quality_score: float

    # Whether the human approved
    human_approved: bool

Why Annotated[list[str], add] for raw_findings? Because both the web researcher and academic researcher run in parallel and both write to this field. Without the add reducer, the second one to finish would overwrite the first. With add, they accumulate. This is the parallel state conflict gotcha from Module 3.2 — solved upfront in the design.

Step 6: Model Routing — Assign the Right Model to Each Agent

Not every agent needs Opus. Match the model to the cognitive complexity of the task:

Agent	Task complexity	Model
Web Researcher	Find and extract information	Haiku (fast, cheap)
Academic Researcher	Find and extract information	Haiku (fast, cheap)
Fact Checker	Compare claims to sources	Sonnet (needs reasoning)
Synthesizer	Find patterns, structure insights	Sonnet
Writer	Produce polished output	Sonnet
Supervisor	Route and coordinate	Sonnet

Opus is not used here. The tasks don't require it. If a future step — say, a complex legal analysis — genuinely needs deeper reasoning, that's when you upgrade that specific agent to Opus. Not before.

Estimated cost comparison:

All Opus: ~$0.45 per run
Routed as above: ~$0.08 per run
Same quality. 5.5x cheaper.

Step 7: The Complete Annotated System

Now the code — with every design decision explained inline:

Python

import os
from typing import Annotated
from operator import add
from langchain_anthropic import ChatAnthropic
from langchain_core.tools import tool
from langgraph.graph import MessagesState
from langchain.agents import create_react_agent   # LangGraph v1+
from langgraph_supervisor import create_supervisor
from langgraph.checkpoint.postgres import PostgresSaver
from langgraph.types import interrupt

# ─── Models — right model for right task ──────────────────────────────────
haiku  = ChatAnthropic(model="claude-haiku-4-5-20251001")   # research workers
sonnet = ChatAnthropic(model="claude-sonnet-4-6")            # reasoning tasks

# ─── State ────────────────────────────────────────────────────────────────
class ResearchState(MessagesState):
    topic: str
    raw_findings: Annotated[list[str], add]  # add reducer for parallel writes
    verified_claims: list[str]
    synthesis: str
    report: str
    quality_score: float
    human_approved: bool

# ─── Tools ────────────────────────────────────────────────────────────────
@tool
def search_web(query: str) -> str:
    """Search the web for current information. Use for news, recent events,
    company data, and any information that may have changed recently."""
    # real implementation: Serper, Tavily, etc.
    return f"Web results for '{query}': ..."

@tool
def search_arxiv(query: str) -> str:
    """Search academic papers on ArXiv. Use for research findings,
    technical details, and peer-reviewed evidence."""
    # real implementation: arxiv API
    return f"Academic papers on '{query}': ..."

@tool
def verify_claim(claim: str, source: str) -> str:
    """Cross-check a specific claim against a source URL or known fact.
    Returns: VERIFIED, UNVERIFIED, or CONTRADICTED with explanation."""
    return f"Verification result for: '{claim}'"

# ─── Worker Agents ────────────────────────────────────────────────────────
# Haiku for research workers — searching is simple, cheap is fine
web_researcher = create_react_agent(
    model=haiku,
    tools=[search_web],
    name="web_researcher",
    prompt=(
        "You are a web research specialist. Search thoroughly for current "
        "information on the given topic. Find at least 3 authoritative sources. "
        "Always include source URLs in your findings."
    ),
)

academic_researcher = create_react_agent(
    model=haiku,
    tools=[search_arxiv],
    name="academic_researcher",
    prompt=(
        "You are an academic research specialist. Find peer-reviewed evidence "
        "and technical depth on the given topic. Focus on recent papers (2023+). "
        "Summarize key findings with paper citations."
    ),
)

fact_checker = create_react_agent(
    model=sonnet,        # fact-checking requires reasoning — upgrade to Sonnet
    tools=[verify_claim, search_web],
    name="fact_checker",
    prompt=(
        "You verify the accuracy of research claims. For each key claim, "
        "cross-check it against your sources. Mark claims as VERIFIED, "
        "UNVERIFIED, or CONTRADICTED. Only pass verified claims forward."
    ),
)

synthesizer = create_react_agent(
    model=sonnet,
    tools=[],
    name="synthesizer",
    prompt=(
        "You synthesize research findings into structured insights. "
        "Identify the 5 most important findings, spot patterns across sources, "
        "note any contradictions between web and academic findings, "
        "and organize everything logically."
    ),
)

writer = create_react_agent(
    model=sonnet,
    tools=[],
    name="writer",
    prompt=(
        "You write clear, well-structured research reports. "
        "Use the synthesized findings to produce a polished report with: "
        "executive summary, key findings, supporting evidence, "
        "and cited sources. Write for an intelligent non-specialist reader."
    ),
)

# ─── Supervisor ───────────────────────────────────────────────────────────
# Sonnet for the supervisor — routing requires moderate intelligence
workflow = create_supervisor(
    agents=[web_researcher, academic_researcher, fact_checker, synthesizer, writer],
    model=sonnet,
    output_mode="last_message",   # supervisor sees only final results, not full histories
    prompt=(
        "You coordinate a research team. Follow this sequence:\n"
        "1. Send the topic to BOTH web_researcher and academic_researcher simultaneously\n"
        "2. Once both return, send all findings to fact_checker\n"
        "3. Send verified claims to synthesizer\n"
        "4. Send synthesis to writer\n"
        "5. Return the final report\n"
        "Do not skip steps. Each agent's output feeds the next."
    ),
)

# ─── Human Review Node ────────────────────────────────────────────────────
# This node pauses execution — a human reviews before anything is published
def human_review_gate(state: ResearchState):
    """Pause here for human review. The graph saves state and waits."""
    decision = interrupt({
        "message": "Research report ready for review. Approve for publication?",
        "report_preview": state["messages"][-1].content[:500] + "...",
    })
    return {"human_approved": decision}

# ─── Compile with Production Features ────────────────────────────────────
DB_URI = os.environ["DATABASE_URL"]  # never hardcode credentials

with PostgresSaver.from_conn_string(DB_URI) as checkpointer:
    checkpointer.setup()  # creates required tables — run once at startup

    app = workflow.compile(
        checkpointer=checkpointer,          # crash recovery
        interrupt_before=["human_review"],  # pause before publishing
    )

# ─── Run ──────────────────────────────────────────────────────────────────
config = {"configurable": {"thread_id": "research-001"}}

# Phase 1: run the research pipeline
result = app.invoke(
    {"messages": [{"role": "user", "content": "Research the latest advances in RAG systems"}]},
    config
)

# Execution pauses here — human reviews in the LangSmith UI or your app
print("Pipeline paused. Report ready for review.")
print(app.get_state(config).values["messages"][-1].content)

# Phase 2: resume after human approval
from langgraph.types import Command

# Human approves:
final = app.invoke(Command(resume=True), config)

# Human rejects:
# final = app.invoke(Command(resume=False), config)

Step 8: What We Applied From Each Module

Looking at this system, every module from Phase 3 contributed something:

What's in the system	From module
Supervisor + Parallel pattern chosen deliberately	3.1 — Architecture Patterns
`Annotated[list, add]` reducer for parallel writes	3.2 — LangGraph Multi-Agent
`create_supervisor` + `output_mode="last_message"`	3.2 — LangGraph Multi-Agent
PostgresSaver checkpointing	3.2 — LangGraph Multi-Agent
Human-in-the-loop interrupt	3.2 — LangGraph Multi-Agent
Role-based agent specialization	3.3 — CrewAI concepts (applied in LangGraph)
Model routing (Haiku for workers, Sonnet for reasoning)	3.5 — Production Ops
LangSmith tracing (3 env vars, already active)	3.5 — Production Ops
`max_concurrency` on compiled graph	3.5 — Production Ops

This is what it looks like when the concepts from each module become actual decisions in a real system. Each choice was made for a specific reason — not because a tutorial said so.

The Design Mindset

The most important thing this module is trying to show isn't the code — it's the thinking that precedes the code.

Every time you start building a multi-agent system, ask these questions in order:

Do I need multi-agent at all? What's the simplest thing that could work?
What are the natural agent boundaries? Where does specialization genuinely add value?
What can run in parallel? What must be sequential? Draw the dependency graph.
Which pattern fits? Sequential, Supervisor, Parallel, Swarm, or a combination?
LangGraph or CrewAI? Do I need fine-grained control, or is speed to prototype the priority?
What goes in shared state? Which fields need reducers?
Which model for which agent? Match cognitive complexity to model capability.
Where are the checkpoints and human gates? What can't be undone?
What am I monitoring? Cost per run, failure rate, latency per agent.

Answer those nine questions before writing any code, and the implementation almost writes itself.

Module 3.6