Tool Use (Function Calling)

The Fundamental Misunderstanding

Most people, when they first encounter tool use, assume the model is running code. It isn't.

The model cannot execute anything. It has no access to your file system, the internet, or any external service. All it can do is generate text. When Claude "uses a tool," what actually happens is:

Claude outputs a structured JSON blob saying "I'd like to call this function with these arguments"
Your code reads that JSON and runs the actual function
You feed the result back to Claude
Claude continues its response using that new information

Claude is the brain deciding what to do. Your code is the hands doing it.

This distinction matters because it means you control everything. You decide which tools exist, what they can access, and whether to actually run them. An agent that "browses the web" only does so because someone wrote a search_web() function and handed it to the model.

How You Define a Tool

A tool definition has three parts: a name, a description, and a JSON Schema describing the inputs.

Python

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a location. Returns temperature in Celsius and a short condition description like 'sunny' or 'rainy'. Use when the user asks about weather in a specific city or region.",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City and country, e.g. 'Tokyo, Japan'"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"]
                }
            },
            "required": ["location"]
        }
    }
]

The description is doing far more work than it looks like. Claude reads it to decide when to call this tool, whether it's the right one, and what to pass to it. Think of it like writing a job posting — the better you describe the role, the better the result.

The Agentic Loop

Python

import anthropic
client = anthropic.Anthropic()

# Step 1: Send the task + available tools
response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's the weather in Tokyo right now?"}]
)

# Step 2: Claude responds with stop_reason "tool_use"
if response.stop_reason == "tool_use":
    tool_block = next(b for b in response.content if b.type == "tool_use")
    tool_name   = tool_block.name    # "get_weather"
    tool_input  = tool_block.input   # {"location": "Tokyo, Japan"}
    tool_use_id = tool_block.id      # "toolu_01XYZ..."

    # Step 3: YOUR code runs the actual function
    result = my_weather_api(tool_input["location"])

    # Step 4: Send the result back
    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1024,
        tools=tools,
        messages=[
            {"role": "user", "content": "What's the weather in Tokyo right now?"},
            {"role": "assistant", "content": response.content},  # include the tool_use block
            {
                "role": "user",
                "content": [{
                    "type": "tool_result",
                    "tool_use_id": tool_use_id,  # must match exactly
                    "content": result
                }]
            }
        ]
    )

print(response.content[0].text)

Two rules that must always hold: include the assistant's tool_use block in your next call, and match the tool_use_id exactly. Both are required for Claude to connect the result to the right request.

Parallel Tool Calls

Claude can request multiple tools in a single response. All results must go back in one user message — not separate messages. Splitting them breaks the alternating conversation structure and causes unpredictable behavior.

Python

# ✅ Correct — one message, all results together
{
    "role": "user",
    "content": [
        {"type": "tool_result", "tool_use_id": "toolu_01", "content": "14°C, overcast"},
        {"type": "tool_result", "tool_use_id": "toolu_02", "content": "8°C, rainy"},
    ]
}

Controlling Tool Behavior

Python

tool_choice={"type": "auto"}                          # Claude decides (default)
tool_choice={"type": "any"}                           # force at least one tool call
tool_choice={"type": "tool", "name": "extract_info"}  # force a specific tool
tool_choice={"type": "none"}                          # plain text only, ignore tools

Be careful with "any" — forcing tool use when Claude doesn't need it produces degraded output. Use it only when a tool call is always appropriate for the task.

When Tools Fail

Return errors as tool results — don't raise exceptions. Claude adapts gracefully to a returned error; an exception crashes the agent and loses the conversation.

Python

try:
    result_content = call_weather_api(location)
    is_error = False
except ConnectionError as e:
    result_content = f"Weather API unavailable: {str(e)}"
    is_error = True

{"type": "tool_result", "tool_use_id": tool_use_id, "content": result_content, "is_error": is_error}

The Tool Runner (Skip the Loop)

The SDK's beta Tool Runner handles the loop automatically. Pass a Python function directly — it reads your docstring and type hints to generate the tool definition:

Python

def get_weather(location: str, unit: str = "celsius") -> str:
    """Get current weather for a given location.

    Args:
        location: City and country, e.g. 'Tokyo, Japan'
        unit: Temperature unit — 'celsius' or 'fahrenheit'
    """
    return "14°C, overcast"

runner = client.beta.messages.tool_runner(
    model="claude-opus-4-6",
    max_tokens=1024,
    tools=[get_weather],
    messages=[{"role": "user", "content": "Compare weather in Tokyo and London"}]
)

final = runner.until_done()
print(final.content[0].text)

Your code documentation is your tool documentation. No duplication needed.

Writing Tool Descriptions That Actually Work

The description must answer four questions: what does this tool do, what do the inputs mean, when should Claude call it, and what does it return.

Python

# ❌ Fails in subtle ways
{"name": "search", "description": "Search for information."}

# ✅ Works reliably
{
    "name": "search_web",
    "description": (
        "Search the web for current information using a search engine. "
        "Returns the top 5 results with titles, URLs, and short summaries. "
        "Use when the user asks about recent events, current prices, live data, "
        "or anything that may have changed since the model's training cutoff. "
        "Do NOT use for questions answerable from general knowledge alone."
    ),
    ...
}

The last sentence — telling Claude when not to use the tool — is as important as telling it when to use it.

Where Things Go Wrong

Tool results before text. tool_result blocks must come first in a user message — any text goes after. Putting text first causes a 400 error.

pause_turn stop reason. Server-side tools have a 10-iteration limit. When hit, re-send the response as a new call — Claude isn't finished, it just hit an internal ceiling.

tool_choice: "any" with extended thinking. Incompatible. Use "auto" when you need both.

Module 2.1b

The Fundamental Misunderstanding

How You Define a Tool

The Agentic Loop

Parallel Tool Calls

Controlling Tool Behavior

When Tools Fail

The Tool Runner (Skip the Loop)

Writing Tool Descriptions That Actually Work

Where Things Go Wrong

Sources