CrewAI

What Is CrewAI?

CrewAI is an open-source framework for building teams of AI agents that collaborate like a real work crew. You describe who each agent is — their role, their goal, their personality — and what you need done. CrewAI handles the coordination, the information flow, and the execution.

Where LangGraph thinks in graphs and state machines, CrewAI thinks in people and teams. You're not wiring nodes and edges — you're hiring specialists and assigning them work. A researcher. A writer. A reviewer. A data analyst. Each agent has a defined role and a clear objective. You put them in a crew, give the crew a task, and let them work.

This is why CrewAI has a much lower learning curve than LangGraph. The concepts map directly to how teams actually work. If you can describe a team in words, you can build it in CrewAI.

Real-World Use Cases

PwC — code generation pipeline. A crew of agents collaborates on software tasks: one writes the code, one reviews it, one checks security, one writes documentation. Before CrewAI, their code generation accuracy was around 10%. After deploying a properly structured crew, it reached 70%. The coordination — one agent's review feeding into another's revision — made the difference.

DocuSign — lead qualification. Multiple agents extract data from different internal systems (CRM, product usage database, marketing platform), then a synthesis agent scores and qualifies the lead. No single agent had access to all the data, but together they could paint a complete picture. Result: dramatically faster time-to-first-contact with qualified leads.

Gelato — lead enrichment. Agents gather company size, printer infrastructure details, and revenue estimates from multiple public and private sources. A synthesis agent combines this into an enriched lead profile. The specialization — one agent per data source — meant each agent could be optimized for exactly what it needed to find.

General enterprise pattern: Most companies start with internal process automation (30–60 days to production), then expand to customer-facing use cases once they trust the system.

Key Terms for This Module

Role — the job title and function of an agent. "Senior Data Researcher." "Code Review Specialist." The model shapes its behavior around this description.

Goal — what the agent is trying to achieve. Specific and outcome-oriented. "Uncover the top 5 trends in {topic} with supporting evidence."

Backstory — the agent's background and personality. Shapes how it approaches problems and communicates. "You are a veteran researcher who always cites sources and flags uncertainty."

Task — a specific assignment. Has a description of what to do and an expected_output describing what the result should look like.

Crew — the team. Contains agents and tasks. Has a process defining execution order.

Process — how the crew executes. Either Sequential (tasks run one after another in order) or Hierarchical (a manager agent dynamically allocates tasks).

Flow — the production orchestration layer that wraps Crews with deterministic control: branching, loops, conditional logic, and explicit state management.

Delegation — when an agent decides it needs help and hands a sub-task to another agent in the crew. Only works when allow_delegation=True.

Guardrail — a validation function attached to a task. If the output doesn't pass, the task retries automatically (up to 3 times by default).

The Role-Playing Trinity

The three most important fields in any CrewAI agent are role, goal, and backstory. Together they shape how the underlying LLM behaves for the entire duration of the task.

This works because LLMs respond powerfully to persona framing. An agent told it's a "veteran data analyst" with "10 years of experience identifying patterns" will approach problems differently than one told it's a "general assistant." The backstory gives the model context about how to reason, what to prioritize, and how to communicate.

Python

from crewai import Agent

researcher = Agent(
    role="Senior Data Researcher",
    goal="Uncover cutting-edge developments in {topic} with evidence from authoritative sources",
    backstory=(
        "You are a veteran research analyst with 10 years of experience. "
        "You always cite your sources, flag uncertainty clearly, and "
        "organize findings so others can act on them immediately."
    ),
    llm="claude-opus-4-6",
    tools=[search_tool, scrape_tool],
    max_iter=10,           # max attempts before forced answer
    verbose=True,
    allow_delegation=False # this agent doesn't hand off work
)

The interpolation trick: {topic} in the goal is a template variable. When you call crew.kickoff(inputs={"topic": "AI agents in 2025"}), it fills in automatically. This makes agents reusable for different inputs without rewriting them.

Tasks — Defining the Work

A task is a specific, scoped assignment. The description tells the agent what to do. The expected_output tells it what the result should look like. Both matter.

Python

from crewai import Task

research_task = Task(
    description=(
        "Research {topic} thoroughly. Find at least 5 authoritative sources. "
        "Identify key developments, major players, and emerging trends. "
        "Collect the URLs of every source you use."
    ),
    expected_output=(
        "A comprehensive research summary with: key facts, statistics, "
        "notable quotes, and source URLs organized by subtopic."
    ),
    agent=researcher,
)

Piping outputs between tasks — the context parameter makes one task's output available to another:

Python

synthesis_task = Task(
    description=(
        "Synthesize the research into a structured report. "
        "Identify the 5 most important findings and 3 recommended actions."
    ),
    expected_output="A structured report with findings and recommendations.",
    context=[research_task],   # researcher's output injected here automatically
    agent=analyst,
)

Structured output with Pydantic — instead of free text, get a typed object back:

Python

from pydantic import BaseModel
from typing import List

class ResearchReport(BaseModel):
    topic: str
    key_findings: List[str]
    sources: List[str]
    recommended_actions: List[str]
    confidence: str  # "High" | "Medium" | "Low"

synthesis_task = Task(
    description="...",
    expected_output="A structured research report.",
    context=[research_task],
    agent=analyst,
    output_pydantic=ResearchReport,   # output is now a typed object
)

Guardrails — automatic retry on bad output:

Python

def validate_has_sources(output) -> tuple[bool, str]:
    if "http" not in output.raw:
        return False, "Output must include at least one URL source."
    return True, output

research_task = Task(
    description="...",
    expected_output="...",
    guardrails=[validate_has_sources],   # retries up to 3x if validation fails
    agent=researcher,
)

This is enormously useful in production. Instead of getting a beautiful but uncited report, the task keeps retrying until the output actually meets your quality criteria.

Crews and Processes

A Crew is the team — it holds the agents, tasks, and execution strategy.

Sequential process (the default):

Tasks run in the order you define them. Each task's output automatically becomes context for the next. Deterministic, predictable, easy to debug.

Python

from crewai import Crew, Process

crew = Crew(
    agents=[researcher, analyst, writer],
    tasks=[research_task, synthesis_task, writing_task],
    process=Process.sequential,
    memory=True,        # enable shared memory across agents
    verbose=True,
    output_log_file="output/run_log.json",  # full audit trail
)

result = crew.kickoff(inputs={"topic": "multi-agent AI systems"})

# Access results
print(result.raw)                  # plain text output
print(result.pydantic)             # structured output (if output_pydantic was set)
print(result.token_usage)          # how much this run cost in tokens

Hierarchical process — when you want a manager:

A manager agent is automatically created (or you provide your own). The manager dynamically decides which agent gets which task based on their capabilities. Tasks are NOT pre-assigned to agents — the manager allocates at runtime.

Python

crew = Crew(
    agents=[researcher, analyst, writer, data_engineer],
    tasks=[task1, task2, task3, task4],  # not pre-assigned
    process=Process.hierarchical,
    manager_llm="claude-opus-4-6",   # capable model for the manager
    verbose=True,
)

When sequential, when hierarchical:

	Sequential	Hierarchical
Task assignment	Fixed upfront	Dynamic at runtime
Predictability	High	Lower
Debugging	Easy	Harder
Best for	Clear linear pipelines	Ambiguous tasks, dynamic allocation
Cost	Predictable	Higher (manager LLM adds calls)

The key question: do you know exactly which agent should do which task? If yes — sequential. If the right agent depends on what the task turns out to be — hierarchical.

Tools — Giving Agents Hands

Without tools, agents can only reason and write text. Tools let them take actions in the world: search the web, read files, query databases, execute code.

The @tool decorator — the simplest approach:

Python

from crewai.tools import tool

@tool("Web Search")
def web_search(query: str) -> str:
    """Search the web for current information on a topic.
    Use when you need recent data, news, or live information."""
    # your real implementation here
    return search_results

@tool("Database Query")
def query_db(sql: str) -> str:
    """Execute a read-only SQL query against the analytics database.
    Returns results as formatted text. Use for historical data and metrics."""
    return run_query(sql)

The BaseTool subclass — full control:

Python

from crewai.tools import BaseTool
from pydantic import BaseModel, Field
from typing import Type

class SearchInput(BaseModel):
    query: str = Field(..., description="The search query to execute")
    max_results: int = Field(default=5, description="Number of results to return")

class WebSearchTool(BaseTool):
    name: str = "Web Search"
    description: str = (
        "Search the web for current information. Returns titles, URLs, and snippets. "
        "Use for recent events, current data, or anything that may have changed recently."
    )
    args_schema: Type[BaseModel] = SearchInput

    def _run(self, query: str, max_results: int = 5) -> str:
        return perform_search(query, max_results)

    async def _arun(self, query: str, max_results: int = 5) -> str:
        return await async_search(query, max_results)

    def cache_function(self, args, result):
        # Cache search results to avoid duplicate API calls
        return True

30+ built-in tools are available out of the box: SerperDevTool (Google search), ScrapeWebsiteTool, PDFSearchTool, CSVSearchTool, CodeInterpreterTool, PGSearchTool (PostgreSQL), GithubSearchTool, and more. You don't need to implement common integrations from scratch.

The docstring is the tool description. The model reads it to decide when to call the tool. Vague docstrings lead to wrong tool calls. Treat them with the same care as Anthropic's tool description best practices.

Memory — Agents That Remember

By default, each crew.kickoff() starts fresh. With memory enabled, agents can recall what they learned in previous runs — across sessions.

Python

crew = Crew(
    agents=[researcher, analyst],
    tasks=[...],
    memory=True,
    embedder={
        "provider": "openai",
        "config": {"model": "text-embedding-3-small"}
    }
)

CrewAI uses a unified Memory class that retrieves memories by ranking them on three dimensions simultaneously: semantic similarity (is this memory relevant to the current query?), recency (how recent is it?), and importance (how significant was it?). Each dimension has a configurable weight.

The practical effect: after your research crew runs on "LLM agents" today, when it runs on "multi-agent systems" tomorrow, the researcher will automatically recall relevant findings from yesterday's run without being explicitly told to.

Storage defaults to LanceDB at ./.crewai/memory. In production, configure this to a persistent store so memories survive restarts.

Flows — The Production Orchestration Layer

Crews handle the autonomous, intelligent work. Flows handle the structure around it — conditional logic, loops, quality gates, branching.

The production pattern: Flow manages the application; Crews do the intelligent work inside Flow steps.

Python

from crewai.flow.flow import Flow, listen, start
from pydantic import BaseModel

class ContentState(BaseModel):
    topic: str = ""
    report: str = ""
    quality_score: float = 0.0
    approved: bool = False

class ContentFlow(Flow[ContentState]):

    @start()
    def initialize(self):
        print(f"Starting content creation for: {self.state.topic}")

    @listen(initialize)
    def run_research_crew(self):
        result = research_crew.kickoff(inputs={"topic": self.state.topic})
        self.state.report = result.raw

    @listen(run_research_crew)
    def quality_check(self):
        # Deterministic quality gate — not LLM-driven
        word_count = len(self.state.report.split())
        has_sources = "http" in self.state.report

        if word_count < 300 or not has_sources:
            print("Quality check failed — re-running research")
            return self.run_research_crew()  # conditional retry loop

        self.state.approved = True

    @listen(quality_check)
    def publish(self):
        if self.state.approved:
            with open("output/report.md", "w") as f:
                f.write(self.state.report)
            print("Report published.")

flow = ContentFlow()
flow.kickoff(inputs={"topic": "multi-agent AI systems in 2025"})

Why Flows matter: Without a Flow, your Crew runs once and you get whatever it produces. With a Flow, you can loop until quality criteria are met, branch based on what the crew found, chain multiple crews in sequence, and recover gracefully from failures — all with explicit, readable Python logic instead of hoping the LLM makes the right decisions.

CrewAI vs LangGraph — The Honest Comparison

	CrewAI	LangGraph
Mental model	Role-based team	State machine / graph
Learning curve	Low	High
Time to first working system	Hours	Days
Flexibility	Lower	Higher
State management	Crew context + memory	Explicit TypedDict with reducers
Debugging	Verbose logs, task replay	Time-travel, breakpoints, checkpoints
Human-in-the-loop	`human_input=True` on tasks	Built-in interrupt pattern
Production stability	Stable	v1.0 (Oct 2025), battle-tested
Monthly downloads	1.38M	6.17M

The 2025 emerging pattern: "Prototype with CrewAI, productionize with LangGraph." Some teams even nest CrewAI Crews inside LangGraph nodes — autonomous agent collaboration within a well-controlled LangGraph execution graph.

When to pick CrewAI:

Your problem maps naturally to a team of role-based specialists
You need to move fast (days, not weeks)
Workflow is mostly linear with occasional delegation
Your team isn't deeply familiar with state machines and graph theory

When to pick LangGraph:

Fine-grained control over every transition is required
Conditional branching or cycles are core to the design
Compliance, auditability, or exact audit trails are required
Scaling to millions of executions per month

Where Things Go Wrong

Cost explosion. A 4-agent crew on Claude Opus can cost 10–50x a single API call. Every agent's full conversation history gets sent on every iteration. Use max_iter to hard-cap loops, route cheaper tasks to smaller models (llm="claude-haiku-4-5-20251001"), and enable cache=True to avoid duplicate tool calls.

Non-determinism. The same crew doesn't always produce the same result. For critical tasks, use output_pydantic for structured output and attach guardrails. Free-text output with no validation is the leading cause of downstream failures.

Manager bottleneck in hierarchical mode. Every task routes through the manager LLM. A confused manager breaks the entire crew. Use a capable model for the manager, write an explicit and detailed manager backstory, and test manager routing in isolation before running the full crew.

Context overflow in long runs. Enable respect_context_window=True (auto-summarizes when approaching token limits) and break long workflows into multiple sequential Crew invocations via a Flow rather than one enormous single run.

Observability gap. CrewAI's built-in logging is basic. For production, connect LangSmith or Langfuse using step_callback and task_callback hooks. You cannot reliably improve what you can't see.

Module 3.3

What Is CrewAI?

Real-World Use Cases

Key Terms for This Module

The Role-Playing Trinity

Tasks — Defining the Work

Crews and Processes

Tools — Giving Agents Hands

Memory — Agents That Remember

Flows — The Production Orchestration Layer

CrewAI vs LangGraph — The Honest Comparison

Where Things Go Wrong

Sources