Module 0.4
The API Layer
How Code Talks to an LLM
Every LLM - Claude, GPT, Gemini - is accessed over HTTP. You send a JSON request, you get a JSON response. That's all that's happening under the hood.
The SDK (like the anthropic Python package) is just a convenience wrapper around that. It handles auth headers, request formatting, and response parsing so you don't have to.
import anthropic
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from environment
response = client.messages.create(
model="claude-opus-4-6-20250929",
max_tokens=1024,
temperature=0,
system="You are a helpful assistant.",
messages=[
{"role": "user", "content": "What is an AI agent?"}
]
)
print(response.content[0].text)Nothing magical. You're sending structured data to a URL and getting structured data back.
Multi-Turn Conversations
The model has no memory of its own. Every API call is stateless. You are responsible for passing the full conversation history each time.
messages = []
# Turn 1
messages.append({"role": "user", "content": "What is RAG?"})
response = client.messages.create(
model="claude-opus-4-6-20250929",
max_tokens=512,
messages=messages
)
assistant_reply = response.content[0].text
messages.append({"role": "assistant", "content": assistant_reply})
# Turn 2 - model sees full history
messages.append({"role": "user", "content": "How does it help agents?"})
response = client.messages.create(
model="claude-opus-4-6-20250929",
max_tokens=512,
messages=messages
)messages is just a list you keep appending to. You send the whole list every call. This is also why context windows fill up - every turn adds more tokens to that list.
Python Things You'll See Constantly
Virtual environments - Isolate a project's dependencies so packages don't conflict across projects.
python -m venv .venv
source .venv/Scripts/activate # Windows Git Bash
pip install anthropicEnvironment variables - The standard way to store API keys. Never hardcode them in your files.
import os
api_key = os.environ["ANTHROPIC_API_KEY"] # reads from your shell environmentasync/await - Agents often run multiple API calls at once. async is how Python handles concurrency without blocking.
async def run_agents():
results = await asyncio.gather(
agent_one.run(task),
agent_two.run(task),
agent_three.run(task)
) # all three run simultaneouslyType hints - Annotations that describe what type a variable or function parameter is. Makes agent code dramatically easier to read.
def search(query: str, max_results: int = 10) -> list[dict]:
...Error Handling
Two errors you'll hit constantly with LLM APIs:
| Error | Cause | Fix |
|---|---|---|
429 Rate Limited | Too many requests too fast | Wait and retry with exponential backoff |
Context length exceeded | Too many tokens in the conversation | Summarize or trim old messages |
Exponential backoff means: wait 1s, retry. If it fails, wait 2s, retry. Then 4s, then 8s. Prevents hammering the API.