Module 3.4
AutoGen & OpenAI Swarm
What Is AutoGen?
AutoGen (by Microsoft) is a multi-agent framework built around conversation. Where LangGraph thinks in graphs and CrewAI thinks in teams, AutoGen thinks in dialogue. Agents take turns talking to each other. They respond, call tools, optionally run code, and keep going until a termination condition is met.
The core mental model is simple: you define agents with personalities and capabilities, put them in a team, give them a task, and they converse until it's done. No state machines. No graph wiring. No crew configuration. Just agents talking.
This makes AutoGen the fastest framework to get something working — especially for tasks that are naturally iterative, like critique-and-revise loops, brainstorming sessions, or anything where agents need to react to each other over multiple turns.
Real-World Use Cases
Code review loops. A Writer agent produces code, a Critic agent reviews it and points out issues, the Writer revises, the Critic reviews again. This Generator + Critic pattern is one of the most useful things AutoGen does well — the conversation structure is exactly how real code review works.
Automated research with validation. A research agent gathers information, a fact-checking agent challenges its claims, the research agent defends or revises. The back-and-forth surfaces errors that a single-pass agent would miss.
Human-in-the-loop conversations. A UserProxyAgent can represent a real human in the conversation — pausing execution to ask for input when the agents hit uncertainty or need approval. AutoGen was one of the first frameworks to make this pattern easy.
Code execution workflows. AutoGen has built-in support for agents that actually run Python code in a sandboxed environment. An agent can write code, execute it, see the output, revise based on errors, and re-run — all autonomously.
Key Terms for This Module
AssistantAgent — an LLM-backed agent. Responds to messages, calls tools, reasons. The primary building block in AutoGen v0.4.
UserProxyAgent — represents a human (or an automated code executor). Can auto-reply, execute code, or wait for real human input.
ConversableAgent — the base class that both inherit from. Subclass this for full control over how an agent generates replies.
Team — the container that manages how agents take turns. Controls the conversation flow.
Termination condition — the rule that stops the conversation. Without one, agents loop forever. Always required.
RoundRobinGroupChat — a team type where agents take turns in a fixed rotation: Agent A → Agent B → Agent A → ...
SelectorGroupChat — a team type where an LLM reads the conversation and picks who speaks next based on context.
Swarm (AutoGen) — a team type where agents explicitly signal who should speak next using a HandoffMessage. Not the same as the swarm pattern from Module 3.1.
How AutoGen Works
Every AutoGen conversation has the same structure:
- A team is given a task
- Agents take turns based on the team type
- Each agent reads the full conversation history and generates a reply
- Tools are called if the agent decides to use them
- The loop continues until a termination condition fires
The key property: every agent sees the full conversation history every turn. This is what enables natural multi-turn collaboration — agents can reference what was said earlier, react to each other's reasoning, and build on previous responses. It also means token costs grow linearly with conversation length.
The Three Team Types
pip install autogen-agentchat autogen-extRoundRobinGroupChat — fixed turn order:
import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import TextMentionTermination
from autogen_ext.models.openai import OpenAIChatCompletionClient
async def main():
model = OpenAIChatCompletionClient(model="gpt-4o-mini")
writer = AssistantAgent(
name="writer",
model_client=model,
system_message=(
"You write concise explanations of technical concepts. "
"When you are satisfied with the critic's feedback, say APPROVE."
),
)
critic = AssistantAgent(
name="critic",
model_client=model,
system_message=(
"You review explanations for clarity, accuracy, and completeness. "
"Be specific about what needs improvement."
),
)
# Conversation stops when "APPROVE" appears in any message
termination = TextMentionTermination("APPROVE")
team = RoundRobinGroupChat([writer, critic], termination_condition=termination)
result = await team.run(
task="Explain what a vector embedding is in 3 sentences."
)
# The final message in the conversation
print(result.messages[-1].content)
asyncio.run(main())What happens: writer produces a draft → critic responds with feedback → writer revises incorporating the feedback → critic reviews again → loop until writer says "APPROVE." Clean, no graph wiring needed.
SelectorGroupChat — LLM picks who speaks:
from autogen_agentchat.teams import SelectorGroupChat
# LLM reads the conversation and selects the most appropriate next speaker
team = SelectorGroupChat(
[researcher, analyst, writer],
model_client=model,
termination_condition=MaxMessageTermination(20)
)Use this when the right agent to speak next genuinely depends on context — not a fixed pattern. The LLM selector reads the conversation and decides.
Multiple termination conditions — combine them:
from autogen_agentchat.conditions import (
TextMentionTermination,
MaxMessageTermination
)
# Stop if "DONE" appears OR if 15 messages have been exchanged
termination = TextMentionTermination("DONE") | MaxMessageTermination(15)Always combine a semantic termination condition (a specific phrase) with a safety maximum message limit. The phrase handles successful completion. The limit handles runaway conversations.
Code Execution in AutoGen
AutoGen's standout feature compared to LangGraph and CrewAI is built-in, sandboxed code execution. An agent can write Python code, run it, see the output, and revise — all in one conversation.
from autogen_ext.code_executors.local import LocalCommandLineCodeExecutor
from autogen_agentchat.agents import CodeExecutorAgent, AssistantAgent
# Agent that writes code
coder = AssistantAgent(
name="coder",
model_client=model,
system_message=(
"You write Python code to solve problems. "
"Always put code inside ```python code blocks. "
"When your code runs successfully and produces the right output, say DONE."
),
)
# Agent that executes the code and returns results
executor = CodeExecutorAgent(
name="executor",
code_executor=LocalCommandLineCodeExecutor(work_dir="./code_workspace"),
)
termination = TextMentionTermination("DONE")
team = RoundRobinGroupChat([coder, executor], termination_condition=termination)
result = await team.run(
task="Write Python code to find all prime numbers up to 100 and print them."
)The coder writes code, the executor runs it and returns the output, the coder sees the results and either declares success or revises. This pattern is powerful for data analysis tasks, algorithm validation, and any workflow where "write code and verify it works" is the goal.
For production: use Docker-based execution (DockerCommandLineCodeExecutor) instead of local execution. Code from an LLM should never run directly on your host system without sandboxing.
What Is OpenAI Swarm?
OpenAI Swarm is an experimental, educational Python framework that demonstrates one specific pattern: routines and handoffs. OpenAI explicitly states it is not intended for production use — it's a teaching tool that makes the handoff pattern visible and understandable.
Why does it matter then? Because the handoff pattern it demonstrates is one of the most important patterns in multi-agent systems — and understanding it at this simple level makes the production implementations in LangGraph and CrewAI much easier to reason about.
The Two Core Concepts of Swarm
Routines are system prompts that define an agent's behavior as a series of steps. A customer service routine might say: "1. Ask the customer what they need. 2. If it's a refund request, collect the order number. 3. ONLY if you cannot resolve it, transfer to the refunds agent." The agent follows these steps naturally because they're in its system prompt.
Handoffs are how one agent passes full control to another. The key mechanism: a handoff function simply returns an Agent object. When the model calls that function, the framework detects the return type and switches the active agent — carrying the full conversation history with it.
from swarm import Swarm, Agent
client = Swarm()
# The handoff function — returns an Agent object to trigger transfer
def transfer_to_billing():
"""Transfer the conversation to the billing specialist."""
return billing_agent
support_agent = Agent(
name="Support",
instructions=(
"You handle general customer support. "
"If the customer has a billing question, transfer to billing."
),
functions=[transfer_to_billing], # handoff is just a function
)
billing_agent = Agent(
name="Billing",
instructions="You handle billing questions and refund requests.",
functions=[],
)
# Run a conversation
response = client.run(
agent=support_agent,
messages=[{"role": "user", "content": "I want a refund for my last order"}]
)
print(response.messages[-1]["content"])
# The billing agent handled this — full context was transferredThe elegance of this: the handoff is just a function that returns an agent. Nothing special. The framework detects the return type and handles the switch. This is the conceptual foundation that LangGraph's Command(goto=agent_name) and CrewAI's delegation are both building on.
The Full Framework Picture
Now that you've seen all four, here's how they relate:
| LangGraph | CrewAI | AutoGen | OpenAI Swarm | |
|---|---|---|---|---|
| Mental model | State machine / graph | Role-based team | Conversation / dialogue | Routines + handoffs |
| Learning curve | High | Low | Medium | Very low |
| Production ready | Yes (v1.0) | Yes | Yes (v0.4) | No (educational) |
| Parallelism | Yes (Send, static edges) | Limited | No (sequential turns) | No |
| Code execution | Via Bash tool | Via CodeInterpreterTool | Built-in (sandboxed) | No |
| Human-in-the-loop | Interrupt pattern | human_input=True | UserProxyAgent | Manual |
| Best for | Complex branching, compliance, production scale | Fast prototyping, role pipelines | Conversational loops, code tasks | Learning the pattern |
The decision in practice:
- Building something for production that needs reliability and auditability → LangGraph
- Prototyping a multi-agent pipeline quickly → CrewAI
- Building a critique/revision loop or code execution workflow → AutoGen
- Learning how the handoff pattern works → OpenAI Swarm
Most real projects use LangGraph or CrewAI for production. AutoGen is genuinely useful for specific use cases (code execution, conversational agents). OpenAI Swarm is for learning — not shipping.
Where Things Go Wrong
AutoGen token costs spiral. Every agent sees the full conversation history on every turn. A 10-message loop between 3 agents = 30 LLM calls. Without a MaxMessageTermination, costs compound fast. Always set one.
AutoGen v0.4 is a breaking redesign. Most tutorials online use the old v0.2 API — initiate_chat(), ConversableAgent as the main class. The new v0.4 API uses AssistantAgent and team classes. They are not compatible. Don't mix them.
SelectorGroupChat can surprise you. When the LLM picks who speaks next, the conversation can go in unexpected directions. Harder to predict and debug than RoundRobinGroupChat. Start with round-robin, switch to selector only when the routing logic genuinely needs to be dynamic.
OpenAI Swarm missing features. Swarm has no persistent memory, no async support, no streaming, no production-grade error handling. Use it to learn the pattern — then implement that pattern in LangGraph or build on the OpenAI Agents SDK for production.
Sources
- AutoGen AgentChat Quickstart
- AutoGen Teams Documentation
- OpenAI Swarm GitHub
- DataCamp: CrewAI vs LangGraph vs AutoGen