AI Engineering Curriculum
Phase 3: Multi-Agent Systems·8 min read

Module 3.4

AutoGen & OpenAI Swarm

What Is AutoGen?

AutoGen (by Microsoft) is a multi-agent framework built around conversation. Where LangGraph thinks in graphs and CrewAI thinks in teams, AutoGen thinks in dialogue. Agents take turns talking to each other. They respond, call tools, optionally run code, and keep going until a termination condition is met.

The core mental model is simple: you define agents with personalities and capabilities, put them in a team, give them a task, and they converse until it's done. No state machines. No graph wiring. No crew configuration. Just agents talking.

This makes AutoGen the fastest framework to get something working — especially for tasks that are naturally iterative, like critique-and-revise loops, brainstorming sessions, or anything where agents need to react to each other over multiple turns.


Real-World Use Cases

Code review loops. A Writer agent produces code, a Critic agent reviews it and points out issues, the Writer revises, the Critic reviews again. This Generator + Critic pattern is one of the most useful things AutoGen does well — the conversation structure is exactly how real code review works.

Automated research with validation. A research agent gathers information, a fact-checking agent challenges its claims, the research agent defends or revises. The back-and-forth surfaces errors that a single-pass agent would miss.

Human-in-the-loop conversations. A UserProxyAgent can represent a real human in the conversation — pausing execution to ask for input when the agents hit uncertainty or need approval. AutoGen was one of the first frameworks to make this pattern easy.

Code execution workflows. AutoGen has built-in support for agents that actually run Python code in a sandboxed environment. An agent can write code, execute it, see the output, revise based on errors, and re-run — all autonomously.


Key Terms for This Module

AssistantAgent — an LLM-backed agent. Responds to messages, calls tools, reasons. The primary building block in AutoGen v0.4.

UserProxyAgent — represents a human (or an automated code executor). Can auto-reply, execute code, or wait for real human input.

ConversableAgent — the base class that both inherit from. Subclass this for full control over how an agent generates replies.

Team — the container that manages how agents take turns. Controls the conversation flow.

Termination condition — the rule that stops the conversation. Without one, agents loop forever. Always required.

RoundRobinGroupChat — a team type where agents take turns in a fixed rotation: Agent A → Agent B → Agent A → ...

SelectorGroupChat — a team type where an LLM reads the conversation and picks who speaks next based on context.

Swarm (AutoGen) — a team type where agents explicitly signal who should speak next using a HandoffMessage. Not the same as the swarm pattern from Module 3.1.


How AutoGen Works

Every AutoGen conversation has the same structure:

  1. A team is given a task
  2. Agents take turns based on the team type
  3. Each agent reads the full conversation history and generates a reply
  4. Tools are called if the agent decides to use them
  5. The loop continues until a termination condition fires

The key property: every agent sees the full conversation history every turn. This is what enables natural multi-turn collaboration — agents can reference what was said earlier, react to each other's reasoning, and build on previous responses. It also means token costs grow linearly with conversation length.


The Three Team Types

Python
pip install autogen-agentchat autogen-ext

RoundRobinGroupChat — fixed turn order:

Python
import asyncio from autogen_agentchat.agents import AssistantAgent from autogen_agentchat.teams import RoundRobinGroupChat from autogen_agentchat.conditions import TextMentionTermination from autogen_ext.models.openai import OpenAIChatCompletionClient async def main(): model = OpenAIChatCompletionClient(model="gpt-4o-mini") writer = AssistantAgent( name="writer", model_client=model, system_message=( "You write concise explanations of technical concepts. " "When you are satisfied with the critic's feedback, say APPROVE." ), ) critic = AssistantAgent( name="critic", model_client=model, system_message=( "You review explanations for clarity, accuracy, and completeness. " "Be specific about what needs improvement." ), ) # Conversation stops when "APPROVE" appears in any message termination = TextMentionTermination("APPROVE") team = RoundRobinGroupChat([writer, critic], termination_condition=termination) result = await team.run( task="Explain what a vector embedding is in 3 sentences." ) # The final message in the conversation print(result.messages[-1].content) asyncio.run(main())

What happens: writer produces a draft → critic responds with feedback → writer revises incorporating the feedback → critic reviews again → loop until writer says "APPROVE." Clean, no graph wiring needed.

SelectorGroupChat — LLM picks who speaks:

Python
from autogen_agentchat.teams import SelectorGroupChat # LLM reads the conversation and selects the most appropriate next speaker team = SelectorGroupChat( [researcher, analyst, writer], model_client=model, termination_condition=MaxMessageTermination(20) )

Use this when the right agent to speak next genuinely depends on context — not a fixed pattern. The LLM selector reads the conversation and decides.

Multiple termination conditions — combine them:

Python
from autogen_agentchat.conditions import ( TextMentionTermination, MaxMessageTermination ) # Stop if "DONE" appears OR if 15 messages have been exchanged termination = TextMentionTermination("DONE") | MaxMessageTermination(15)

Always combine a semantic termination condition (a specific phrase) with a safety maximum message limit. The phrase handles successful completion. The limit handles runaway conversations.


Code Execution in AutoGen

AutoGen's standout feature compared to LangGraph and CrewAI is built-in, sandboxed code execution. An agent can write Python code, run it, see the output, and revise — all in one conversation.

Python
from autogen_ext.code_executors.local import LocalCommandLineCodeExecutor from autogen_agentchat.agents import CodeExecutorAgent, AssistantAgent # Agent that writes code coder = AssistantAgent( name="coder", model_client=model, system_message=( "You write Python code to solve problems. " "Always put code inside ```python code blocks. " "When your code runs successfully and produces the right output, say DONE." ), ) # Agent that executes the code and returns results executor = CodeExecutorAgent( name="executor", code_executor=LocalCommandLineCodeExecutor(work_dir="./code_workspace"), ) termination = TextMentionTermination("DONE") team = RoundRobinGroupChat([coder, executor], termination_condition=termination) result = await team.run( task="Write Python code to find all prime numbers up to 100 and print them." )

The coder writes code, the executor runs it and returns the output, the coder sees the results and either declares success or revises. This pattern is powerful for data analysis tasks, algorithm validation, and any workflow where "write code and verify it works" is the goal.

For production: use Docker-based execution (DockerCommandLineCodeExecutor) instead of local execution. Code from an LLM should never run directly on your host system without sandboxing.


What Is OpenAI Swarm?

OpenAI Swarm is an experimental, educational Python framework that demonstrates one specific pattern: routines and handoffs. OpenAI explicitly states it is not intended for production use — it's a teaching tool that makes the handoff pattern visible and understandable.

Why does it matter then? Because the handoff pattern it demonstrates is one of the most important patterns in multi-agent systems — and understanding it at this simple level makes the production implementations in LangGraph and CrewAI much easier to reason about.


The Two Core Concepts of Swarm

Routines are system prompts that define an agent's behavior as a series of steps. A customer service routine might say: "1. Ask the customer what they need. 2. If it's a refund request, collect the order number. 3. ONLY if you cannot resolve it, transfer to the refunds agent." The agent follows these steps naturally because they're in its system prompt.

Handoffs are how one agent passes full control to another. The key mechanism: a handoff function simply returns an Agent object. When the model calls that function, the framework detects the return type and switches the active agent — carrying the full conversation history with it.

Python
from swarm import Swarm, Agent client = Swarm() # The handoff function — returns an Agent object to trigger transfer def transfer_to_billing(): """Transfer the conversation to the billing specialist.""" return billing_agent support_agent = Agent( name="Support", instructions=( "You handle general customer support. " "If the customer has a billing question, transfer to billing." ), functions=[transfer_to_billing], # handoff is just a function ) billing_agent = Agent( name="Billing", instructions="You handle billing questions and refund requests.", functions=[], ) # Run a conversation response = client.run( agent=support_agent, messages=[{"role": "user", "content": "I want a refund for my last order"}] ) print(response.messages[-1]["content"]) # The billing agent handled this — full context was transferred

The elegance of this: the handoff is just a function that returns an agent. Nothing special. The framework detects the return type and handles the switch. This is the conceptual foundation that LangGraph's Command(goto=agent_name) and CrewAI's delegation are both building on.


The Full Framework Picture

Now that you've seen all four, here's how they relate:

LangGraphCrewAIAutoGenOpenAI Swarm
Mental modelState machine / graphRole-based teamConversation / dialogueRoutines + handoffs
Learning curveHighLowMediumVery low
Production readyYes (v1.0)YesYes (v0.4)No (educational)
ParallelismYes (Send, static edges)LimitedNo (sequential turns)No
Code executionVia Bash toolVia CodeInterpreterToolBuilt-in (sandboxed)No
Human-in-the-loopInterrupt patternhuman_input=TrueUserProxyAgentManual
Best forComplex branching, compliance, production scaleFast prototyping, role pipelinesConversational loops, code tasksLearning the pattern

The decision in practice:

  • Building something for production that needs reliability and auditability → LangGraph
  • Prototyping a multi-agent pipeline quickly → CrewAI
  • Building a critique/revision loop or code execution workflow → AutoGen
  • Learning how the handoff pattern works → OpenAI Swarm

Most real projects use LangGraph or CrewAI for production. AutoGen is genuinely useful for specific use cases (code execution, conversational agents). OpenAI Swarm is for learning — not shipping.


Where Things Go Wrong

AutoGen token costs spiral. Every agent sees the full conversation history on every turn. A 10-message loop between 3 agents = 30 LLM calls. Without a MaxMessageTermination, costs compound fast. Always set one.

AutoGen v0.4 is a breaking redesign. Most tutorials online use the old v0.2 API — initiate_chat(), ConversableAgent as the main class. The new v0.4 API uses AssistantAgent and team classes. They are not compatible. Don't mix them.

SelectorGroupChat can surprise you. When the LLM picks who speaks next, the conversation can go in unexpected directions. Harder to predict and debug than RoundRobinGroupChat. Start with round-robin, switch to selector only when the routing logic genuinely needs to be dynamic.

OpenAI Swarm missing features. Swarm has no persistent memory, no async support, no streaming, no production-grade error handling. Use it to learn the pattern — then implement that pattern in LangGraph or build on the OpenAI Agents SDK for production.


Sources