AI Engineering Curriculum
Phase 2: Single AI Agent Development·11 min read

Module 2.3

Agentic Patterns

What a Pattern Actually Is

An agent loop is just: think, act, observe, repeat. But how an agent thinks before acting — whether it plans ahead, reflects on its own output, runs tools in parallel or sequence — changes everything about its reliability, cost, and behavior.

Agentic patterns are proven recipes for structuring that thinking. They're not framework features or library calls. They're prompt-level and architecture-level decisions you make when designing an agent. The right pattern for a task can be the difference between an agent that completes reliably and one that loops endlessly or produces garbage.


Workflows vs. Agents — Choosing Your Architecture

Before picking a pattern, pick the right architecture class.

A workflow is an LLM system with a predetermined structure. The flow of execution is defined in your code. The model fills in steps — generates text, makes classifications, produces outputs — but you decide what those steps are and in what order they run.

An agent is an LLM system that directs its own execution. The model decides what tools to call, in what order, and for how many steps. The flow emerges at runtime from the model's judgment — not from your code.

The key mental model: Workflows are deterministic pipelines with LLM-powered steps baked in. Agents are open loops where the model calls the shots.

This distinction matters because it shapes every design decision that follows. Workflows are more predictable, cheaper to run, and easier to debug. Agents are more flexible and handle open-ended tasks — but they're harder to control and more expensive.

Anthropic's rule of thumb: start with the simplest architecture that can do the job. A well-designed workflow beats an autonomous agent for most structured tasks. Add autonomy only when the task genuinely requires the model to make routing decisions you can't predict at design time.


Prompt Chaining — The Simplest Workflow

Prompt chaining is the pattern where a task is broken into a fixed sequence of LLM calls, with each call processing the output of the previous one.

Input → [LLM Step 1] → Output → [LLM Step 2] → Output → [LLM Step 3] → Final

It's not one big prompt trying to do everything at once — it's a pipeline. Each step is focused on exactly one sub-task, which makes it more reliable than asking a single prompt to handle the whole job.

A real example: generate marketing copy, verify it meets length constraints, then translate it.

Python
async def marketing_pipeline(product: str) -> dict: # Step 1: Generate copy = await generate("Write 80-word marketing copy for: " + product) # Gate: programmatic check before passing downstream if len(copy.split()) > 100: copy = await generate("Shorten this to under 80 words: " + copy) # Step 2 & 3: Transform (parallel — they don't depend on each other) spanish, french = await asyncio.gather( generate("Translate to Spanish: " + copy), generate("Translate to French: " + copy), ) return {"en": copy, "es": spanish, "fr": french}

The gate is the key ingredient. Between steps, add a programmatic check — a simple if condition, a format validator, or another LLM call — to verify the output meets your criteria before it flows downstream. Gates are what prevent garbage from propagating through the chain.

When to use it: Tasks with a fixed, enumerable sequence of transformations. Document processing pipelines. Multi-step content generation. Any workflow where you can draw all the boxes and arrows before writing a single line of code.

When it breaks down: When you can't enumerate the steps upfront. When step B needs to branch differently depending on what step A returned. That's when you need an agent with dynamic decision-making — ReAct, not a chain.


ReAct — The Default

ReAct (Reason + Act) is the pattern you've been using without naming it. Before each tool call, Claude explicitly states its reasoning. The visible thought process becomes part of the context that informs the next step.

Thought: The user wants NVIDIA's current stock price.
         I need live data — I can't answer from training knowledge.
Action: search_web("NVIDIA stock price today")
Observation: $875.23 as of market close
Thought: I have the data. I can answer directly now.
Answer: NVIDIA closed at $875.23 today.

This works because it forces the model to commit to reasoning before acting. Each token generated becomes context for the next — by writing out a thought, Claude gives itself better information to act on.

When to use it: Almost always. It's the sensible default for any multi-step task. It also gives you observability — you can see what Claude was thinking when something goes wrong.


Plan-then-Execute — When the Scope Is Known

Sometimes you know the full task upfront and the steps don't depend on each other's results. Making a complete plan first and executing it step by step is more efficient than figuring it out as you go.

User: "Refactor all API endpoints to use the new auth middleware"

Plan:
1. List all files in /routes containing endpoint definitions
2. For each file: read it, identify affected functions, apply the middleware wrapper
3. Run tests after all changes are made
4. If tests fail, revert the last change and report

Execute: [carry out each step in order]

The key difference from ReAct: in ReAct, the agent decides what to do next at each step. In Plan-then-Execute, it decides everything upfront and becomes an executor. This makes behavior more predictable and easier to audit — you can inspect the plan before any changes happen.

When to use it: Tasks with a clear, enumerable set of steps that don't require dynamic course correction — code refactoring, batch processing, systematic audits.

The limitation: If an early step fails unexpectedly, a rigid plan doesn't adapt well. ReAct handles surprises better.


Reflection — For High-Stakes Output

Reflection adds a review step before returning an answer. After completing a task, the agent examines its own output, critiques it, and revises before the user ever sees it.

A more powerful version uses two separate agents — a Generator and a Critic:

Generator produces:  [draft output]
Critic evaluates:    "Missing error handling for 429 responses.
                      Auth logic doesn't handle token expiry."
Generator revises:   [improved output incorporating the critique]

Why it works: A model evaluating its own completed output catches different errors than the same model generating it token by token. The critique benefits from seeing the full picture at once.

When to use it: Code going to production. Financial calculations. Any output where a silent mistake is costly. The extra API call is worth it.


Evaluator-Optimizer — The Refinement Loop

Evaluator-Optimizer is the pattern where one LLM generates output and a separate LLM evaluates it — and this loop runs until the evaluator is satisfied or a maximum iteration count is hit.

It extends what Reflection does. The difference: Reflection is the model critiquing its own output in a single pass. Evaluator-Optimizer is a loop with a binary exit condition — the evaluator says PASS or FAIL, and that drives the next iteration.

[Generator] → output → [Evaluator] → PASS → return output
                              ↓
                         FAIL + feedback
                              ↓
             [Generator] ← revised prompt (loop continues)
Python
async def evaluator_optimizer(task: str, criteria: str, max_iterations: int = 3) -> str: output = await generator(task) for _ in range(max_iterations): verdict = await evaluator( f"Does this output meet the criteria?\n\n" f"Criteria: {criteria}\n\nOutput: {output}\n\n" f"Respond: PASS or FAIL followed by specific feedback." ) if verdict.startswith("PASS"): return output feedback = verdict[5:].strip() output = await generator( f"Task: {task}\n\nPrevious attempt:\n{output}\n\n" f"Feedback:\n{feedback}\n\nRevise and improve." ) return output # best attempt after max iterations

The critical requirement: this only works when you have clear, evaluable criteria. "Make it better" is not a criterion. "Verify all code examples compile without errors" is. "Check every factual claim has a source" is. Vague criteria produce a loop that spins without making meaningful progress.

When to use it: Code generation where correctness can be tested. Translations that must hit a specific quality threshold. Any output where you can define a machine-checkable standard for "done."

The cost reality: Every iteration is another full API call. A 3-round loop costs 3x generation plus evaluator overhead. Cap max_iterations conservatively and track whether the loop actually runs multiple rounds in practice. If it almost always passes on the first try, use Reflection instead — same idea, half the complexity.


Parallelism — The Practical Speedup

Independent subtasks can run simultaneously. This is one of the simplest ways to make agents faster.

Python
async def parallel_research(topics: list[str]): tasks = [ query(prompt=f"Research {topic} and summarize key findings") for topic in topics ] results = await asyncio.gather(*tasks) return results

The rule is simple: independent tasks in parallel, dependent tasks in sequence. If task B needs the output of task A, they must be sequential. If they don't depend on each other, there's no reason to wait.


Context Management — The Problem Nobody Talks About

Every message adds tokens. Every tool result. Every assistant turn. In a long-running agent, the context window fills up — and when it does, the agent stops working. This is one of the most common production failures with agents.

Three strategies:

Sliding window — keep only the N most recent turns:

Python
MAX_TURNS = 20 if len(messages) > MAX_TURNS * 2: messages = messages[-(MAX_TURNS * 2):]

Summarize and compress — when history gets long, have Claude summarize it and replace the history with that summary. Preserves key information without the token cost.

External memory (RAG) — store information in a vector database, retrieve only the relevant chunks when needed. The most powerful approach — covered in Module 2.4.

The agent implication: A demo that works for 10 turns but fails at turn 40 isn't a production agent. Context management is what makes the difference.


Stopping Conditions

Every agent loop needs an exit condition. Without one, you have an infinite loop waiting to happen. stop_reason == "end_turn" isn't always enough — always set a hard turn limit:

Python
MAX_TURNS = 50 for turn in range(MAX_TURNS): response = client.messages.create(...) if response.stop_reason == "end_turn": break else: print("WARNING: Agent hit max turns without completing")

For irreversible actions — deleting data, sending emails, pushing to production — insert a human checkpoint:

Python
if action_is_destructive(tool_name, tool_input): confirm = input(f"Claude wants to: {tool_name}({tool_input}). Allow? [y/n] ") if confirm.lower() != "y": return "Action denied by user"

The minimal footprint principle: Agents should request only the permissions they need, prefer reversible actions, and check in before doing anything that can't be undone. The tighter you constrain an agent, the safer it is to give it more autonomy.


ACI — Engineering Your Tools Like a Product

The biggest lever most engineers miss when building agents isn't the prompt. It's the tools.

ACI stands for Agent-Computer Interface — the agent equivalent of HCI (Human-Computer Interface). Just as HCI design determines how well a human can navigate a piece of software, ACI design determines how reliably an agent can use a tool. Poor ACI looks like reasoning failures but it's actually interface failures.

The mental model: when you define a tool, you're not just writing a function — you're designing an interface the model will read and decide how to call. Unclear descriptions lead to wrong calls. Ambiguous parameters lead to bad arguments. Poor naming leads to the model picking the wrong tool entirely.

Anthropic's engineering team discovered this directly while building their coding agent: they spent more time optimizing tool definitions than the main system prompt. One specific change — requiring absolute file paths instead of relative paths — reduced model errors significantly. The prompt didn't change. The tool interface did.

Four principles for good ACI:

1. Use formats the model already knows. Name things like standard library code. Use JSON with descriptive keys. Don't invent novel schemas — the model has never seen your custom schema, but it's seen thousands of well-named standard library functions.

2. Minimize cognitive load. Don't make the model count lines, calculate offsets, or escape strings to use a tool correctly. If correct usage requires mental arithmetic, redesign the tool.

3. Treat the docstring as a prompt. Include what the tool does, what it returns, an example call, and what to do when it fails. Your docstring is the model's only documentation.

4. Prevent errors structurally. If you can make a wrong call impossible by changing the interface, do it. Accept IDs instead of names when uniqueness matters. Return structured error objects instead of raw exceptions.

Python
# Bad: ambiguous, high cognitive load def edit_file(file, start, end, content): """Edit a file.""" # Good: clear contract, self-documenting, error-resistant def replace_lines( absolute_file_path: str, # e.g. "/home/user/project/src/main.py" start_line: int, # 1-indexed, inclusive end_line: int, # 1-indexed, inclusive replacement_text: str, ) -> dict: """ Replace a range of lines in a file with new content. Returns: {"success": true, "lines_replaced": N} or {"error": "reason"} Example: replace_lines("/home/user/app.py", 10, 12, "def new_fn():\n pass\n") """

The agent implication: Tool quality is a hard ceiling on agent reliability. A well-prompted agent with poorly designed tools will still fail on tool-use tasks. Fix the interface before tuning the prompt.


Where Things Go Wrong

Infinite loops. An agent that can't complete a task will keep trying. Without a max_turns limit, it burns through your API budget indefinitely. Always set a hard ceiling.

Trusting visible reasoning. ReAct makes Claude's thinking visible, but visible reasoning can still be wrong. Don't treat the agent's stated reasoning as a substitute for actually verifying its output.

Over-planning on uncertain tasks. Plan-then-Execute assumes you know the full scope upfront. If the task is exploratory or the steps depend on what you discover, a rigid plan breaks. Use ReAct for anything with unknowns.

Reflection costs money. Every reflection pass is another full API call. Reserve it for outputs where correctness genuinely matters — not for every task.


Sources