The API Layer | AI Engineering Curriculum

How Code Talks to an LLM

Every LLM - Claude, GPT, Gemini - is accessed over HTTP. You send a JSON request, you get a JSON response. That's all that's happening under the hood.

The SDK (like the anthropic Python package) is just a convenience wrapper around that. It handles auth headers, request formatting, and response parsing so you don't have to.

Python

import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from environment

response = client.messages.create(
    model="claude-opus-4-6-20250929",
    max_tokens=1024,
    temperature=0,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "What is an AI agent?"}
    ]
)

print(response.content[0].text)

Nothing magical. You're sending structured data to a URL and getting structured data back.

Multi-Turn Conversations

The model has no memory of its own. Every API call is stateless. You are responsible for passing the full conversation history each time.

Python

messages = []

# Turn 1
messages.append({"role": "user", "content": "What is RAG?"})
response = client.messages.create(
    model="claude-opus-4-6-20250929",
    max_tokens=512,
    messages=messages
)
assistant_reply = response.content[0].text
messages.append({"role": "assistant", "content": assistant_reply})

# Turn 2 - model sees full history
messages.append({"role": "user", "content": "How does it help agents?"})
response = client.messages.create(
    model="claude-opus-4-6-20250929",
    max_tokens=512,
    messages=messages
)

messages is just a list you keep appending to. You send the whole list every call. This is also why context windows fill up - every turn adds more tokens to that list.

Python Things You'll See Constantly

Virtual environments - Isolate a project's dependencies so packages don't conflict across projects.

Bash

python -m venv .venv
source .venv/Scripts/activate   # Windows Git Bash
pip install anthropic

Environment variables - The standard way to store API keys. Never hardcode them in your files.

Python

import os
api_key = os.environ["ANTHROPIC_API_KEY"]  # reads from your shell environment

async/await - Agents often run multiple API calls at once. async is how Python handles concurrency without blocking.

Python

async def run_agents():
    results = await asyncio.gather(
        agent_one.run(task),
        agent_two.run(task),
        agent_three.run(task)
    )  # all three run simultaneously

Type hints - Annotations that describe what type a variable or function parameter is. Makes agent code dramatically easier to read.

Python

def search(query: str, max_results: int = 10) -> list[dict]:
    ...

Error Handling

Two errors you'll hit constantly with LLM APIs:

Error	Cause	Fix
`429 Rate Limited`	Too many requests too fast	Wait and retry with exponential backoff
`Context length exceeded`	Too many tokens in the conversation	Summarize or trim old messages

Exponential backoff means: wait 1s, retry. If it fails, wait 2s, retry. Then 4s, then 8s. Prevents hammering the API.

Sources

Anthropic API Docs — Get Started