AI Engineering Curriculum
Phase 0: Foundations·2 min read

Module 0.4

The API Layer

How Code Talks to an LLM

Every LLM - Claude, GPT, Gemini - is accessed over HTTP. You send a JSON request, you get a JSON response. That's all that's happening under the hood.

The SDK (like the anthropic Python package) is just a convenience wrapper around that. It handles auth headers, request formatting, and response parsing so you don't have to.

Python
import anthropic client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from environment response = client.messages.create( model="claude-opus-4-6-20250929", max_tokens=1024, temperature=0, system="You are a helpful assistant.", messages=[ {"role": "user", "content": "What is an AI agent?"} ] ) print(response.content[0].text)

Nothing magical. You're sending structured data to a URL and getting structured data back.


Multi-Turn Conversations

The model has no memory of its own. Every API call is stateless. You are responsible for passing the full conversation history each time.

Python
messages = [] # Turn 1 messages.append({"role": "user", "content": "What is RAG?"}) response = client.messages.create( model="claude-opus-4-6-20250929", max_tokens=512, messages=messages ) assistant_reply = response.content[0].text messages.append({"role": "assistant", "content": assistant_reply}) # Turn 2 - model sees full history messages.append({"role": "user", "content": "How does it help agents?"}) response = client.messages.create( model="claude-opus-4-6-20250929", max_tokens=512, messages=messages )

messages is just a list you keep appending to. You send the whole list every call. This is also why context windows fill up - every turn adds more tokens to that list.


Python Things You'll See Constantly

Virtual environments - Isolate a project's dependencies so packages don't conflict across projects.

Bash
python -m venv .venv source .venv/Scripts/activate # Windows Git Bash pip install anthropic

Environment variables - The standard way to store API keys. Never hardcode them in your files.

Python
import os api_key = os.environ["ANTHROPIC_API_KEY"] # reads from your shell environment

async/await - Agents often run multiple API calls at once. async is how Python handles concurrency without blocking.

Python
async def run_agents(): results = await asyncio.gather( agent_one.run(task), agent_two.run(task), agent_three.run(task) ) # all three run simultaneously

Type hints - Annotations that describe what type a variable or function parameter is. Makes agent code dramatically easier to read.

Python
def search(query: str, max_results: int = 10) -> list[dict]: ...

Error Handling

Two errors you'll hit constantly with LLM APIs:

ErrorCauseFix
429 Rate LimitedToo many requests too fastWait and retry with exponential backoff
Context length exceededToo many tokens in the conversationSummarize or trim old messages

Exponential backoff means: wait 1s, retry. If it fails, wait 2s, retry. Then 4s, then 8s. Prevents hammering the API.


Sources