Getting started with Claude Sonnet 4 and Claude Opus 4

Getting set up and running Anthropic's new Claude 4 Sonnet and Opus on your machine in Python using the API.
Created on May 25|Last edited on May 27
Comment
﻿
Like everyone else, I was excited to jump right in and start playing around with Claude 4 as soon as I heard about it in the announcement. I figured you might, too, and so I put together this quickstart.
💡
Claude 4 is Anthropic’s latest generation of large language models—highlighted by the new Sonnet 4 and Opus 4 models—are setting a new bar for coding ability, reasoning, and agent workflows. These models support extended “thinking” and integrated tool use all of which lets Claude reason step-by-step, call out to calculators, databases, or other services, while transparently showing every step in its process.
In this quickstart, you’ll get hands-on with the raw Claude 4 API and W&B Weave so you can track, visualize, and debug exactly how Claude reasons and uses tools inside your own workflows.
I've also created a handy Colab for those who want to jump right to seeing it in action.
Here's what we'll be covering:
Table Of ContentsGetting your Anthropic API KeyClaude 4 Sonnet and Opus pricingW&B Weave Getting started with Claude Sonnet 4 and Claude Opus 4Step 1: The Anthropic API keyStep 2: Installing Anthropic and W&B Weave via PipStep 3: Writing the inference Script: Using Claude Opus 4 and Claude Sonnet 4 with tool use ConclusionRelated Reading
﻿
If you're just getting started and don't yet have your machine set up to run Python, here's a quick tutorial here that will have you up-and-running in just a few minutes.
💡
Getting your Anthropic API KeyBefore you can begin using Claude Sonnet 4 and Claude Opus 4, you'll need an API key to access it. First, start by creating an account with Anthropic, then head over to Anthropic's API Console and click "API Keys" in the bottom left corner.
﻿
Next, you will be able to click the "Create Key" button in order to create your key:
﻿
Give your key a name and click the "Add" button: 
﻿
You're good to go. 
Claude 4 Sonnet and Opus pricingAs of this writing (May 24, 2025) the pricing for Claude 4 Opus and Sonnet is the following (we left Sonnet 3.7 pricing for reference):
﻿
W&B Weave ﻿W&B Weave enhances our project by offering a streamlined way to track and evaluate model outputs. To use Weave, you start by importing it and initializing it with your project name.
The key feature is the @weave.op() decorator, which you add above any function you want to track. A decorator in Python is a special function that adds functionality to another function. By adding @weave.op() above your function definition, you instruct Weave to automatically log the inputs and outputs of that function. This makes it easy to monitor what data goes into the function and what results come out.
After running your code, you can view these logs in the Weave dashboard, which provides detailed visualizations and traces of the function calls. This helps in debugging and organizing your experimental data, making the development process with Claude more efficient and insightful.
Getting started with Claude Sonnet 4 and Claude Opus 4A couple notes before we jump in: 
The screenshots in this tutorial will show me using Google Colab. 
If you've never used Colabs or Jupyter before there's a nice quickstart here. And a refresher on key markdown here.
One of the great things about Google Colabs is that you can add your comments and context in text fields:
﻿
Step 1: The Anthropic API keyThe first thing we need to do is set our Anthropic API key.
The code to do this is:
if "ANTHROPIC_API_KEY" not in os.environ:
    os.environ["ANTHROPIC_API_KEY"] = "your api key"
You'll want to replace "your api key" with your Anthropic API Key.
Step 2: Installing Anthropic and W&B Weave via PipTo make Claude Sonnet 4 and Claude Opus 4 work, you only need Anthropic. That said? QWe'll be showing you how you can effortlessly track your inputs and outputs using W&B Weave.
This is a good time to sign up for Weights & Biases. Doing so now saves you from having to pause the tutorial.The code to do this is:
!pip install anthropic weave 
and then run the cell.
﻿
Now you've installed it, we still need to import it for use 
If you're new to Python, basically when we install a library we simply grab the code. When we import it, we make that library available for use.
💡
Step 3: Writing the inference Script: Now let's write a script that lets us run inference with either Claude Sonnet 4 or Claude Opus 4 using Anthropic's Python SDK. We'll provide an option for streaming outputs and for enabling Claude's extended thinking (which streams Claude's internal reasoning before the answer).
Below is a robust Python script (with a @weave.op for tracking inputs and outputs) that you can copy and adapt.
import os 
# Optional: default API key fallback for demo (replace/remove for production!)
if "ANTHROPIC_API_KEY" not in os.environ:
    os.environ["ANTHROPIC_API_KEY"] = "your_claude_api_key"
﻿
if "WANDB_API_KEY" not in os.environ:
    os.environ["WANDB_API_KEY"] = "your_wandb_api_key"
import anthropic
import os
import weave; weave.init("claude_4")
# === CONFIGURABLE ARGS ===
streaming = True          # Set to True for streaming, False for classic response
enable_thinking = True    # Set to True to enable extended thinking
﻿
# ==== MODEL SELECTION ====
# Claude Opus 4 (highest reasoning, highest cost)
model = "claude-opus-4-20250514"
# Claude Sonnet 4 (fast, less expensive)
# model = "claude-sonnet-4-20250514"
﻿
client = anthropic.Anthropic()
﻿
﻿
@weave.op
def claude_inference(prompt, model, streaming=True, enable_thinking=True):
    """
    Run inference on given prompt with specified Claude model,
    printing thinking and response as they arrive,
    and returning the full final response text.
﻿
    Returns: text response as a string, or None on failure
    """
    kwargs = dict(
        model=model,
        max_tokens=2048 if enable_thinking else 512,
        messages=[{"role": "user", "content": prompt}],
    )
﻿
    if enable_thinking:
        kwargs["thinking"] = {"type": "enabled", "budget_tokens": 1024}
﻿
    response_text = ""
﻿
    if streaming:
        with client.messages.stream(**kwargs) as stream:
            for event in stream:
                if event.type == "content_block_start":
                    block_type = event.content_block.type
                    if block_type == "thinking":
                        print("\n[THINKING]: ", end="", flush=True)
                    elif block_type == "text":
                        print("\n[RESPONSE]: ", end="", flush=True)
                elif event.type == "content_block_delta":
                    d = event.delta
                    if getattr(d, "type", None) == "thinking_delta":
                        print(d.thinking, end="", flush=True)
                    elif getattr(d, "type", None) == "text_delta":
                        print(d.text, end="", flush=True)
                        response_text += d.text
                elif event.type == "content_block_stop":
                    print()  # Finish this block
    else:
        response = client.messages.create(**kwargs)
        for block in response.content:
            if block.type == "thinking" and block.thinking.strip():
                print("\n[THINKING]:", block.thinking.strip(), flush=True)
            elif block.type == "text" and block.text.strip():
                print("\n[RESPONSE]:", block.text.strip(), flush=True)
                response_text += block.text.strip()
    # return response_text if response_text else None
    return str(response_text)
# === USAGE EXAMPLE ===
﻿
prompt = (
    "If a train travels 60 miles per hour for 2.5 hours, how far does it go? "
    "Show your reasoning step by step."
)
﻿
final_answer = claude_inference(
    prompt,
    model=model,
    streaming=streaming,
    enable_thinking=enable_thinking
)
﻿
print("\n\n=== FINAL RETURNED RESPONSE ===\n", final_answer)
To switch models, just change the model variable. For Opus 4, set model = "claude-opus-4-20250514". For Sonnet 4, set model = "claude-sonnet-4-20250514". Toggle streaming and enable_thinking at the top of the script to see reasoning and/or stream content as it becomes available.
After running our script, we will see a link where we can visualize the output of our model in Weave! 
﻿
After clicking the link, we can view the inputs and outputs for our model: 
﻿
Claude can answer in two different modes: standard (“non-thinking”) and “extended thinking.” In non-thinking mode, Claude just gives you the answer, like most chatbots. You'll only see the final response, with no insight into how the answer was developed.
Claude will first show you its internal reasoning step-by-step (called “thinking blocks”), and then give you the final answer. This helps you see how Claude arrived at the response, which can be valuable for transparency, debugging, or tasks that want detailed logic.
In the code, we enabled thinking mode by setting enable_thinking = True and adding this block when constructing the API call:
if enable_thinking:
    kwargs["thinking"] = {"type": "enabled", "budget_tokens": 1024}
This tells Claude to use extended thinking and sets a "thinking budget" where the minimum value for budget_tokens is 1024 tokens. This number tells Claude the maximum amount of output tokens it can use for its internal reasoning process. Your total max_tokens for the request must be greater than your budget_tokens, because some tokens need to be reserved for the final answer in addition to the “thinking” tokens. For example, if your thinking budget is 1024, your max_tokens might be 2048, ensuring there’s room for both the thinking and the answer. 
Using Claude Opus 4 and Claude Sonnet 4 with tool use One of the most exciting features in Claude 4 Sonnet and Opus is tool use, namely the ability for Claude to call custom functions, like a calculator or database, during its reasoning process. This allows Claude to answer complex prompts using your real data, logic, or services, rather than relying on general world knowledge alone.
To illustrate, consider a business analytics scenario. Suppose you want to know, “What’s the total revenue if we sold 150 units at $50 each? Also, how does that compare to our average monthly revenue?” When you use tool use with Claude 4, you can let the model both perform the multiplication and look up your real or sample revenue data - all in one turn, without coding the business logic yourself.
To use this feature, you register your tools (such as a calculator or a data fetcher) when making a Claude API call. As Claude works through a task - say, “What’s total revenue for 150 units at $50, and how does that compare to our monthly average?” - it can call your functions to compute the number, look up data, and then reason about the results, all in one turn.
With interleaved thinking enabled, Claude will pause between tool calls to share its intermediate analysis, reflects on the latest results, and decides intelligently what to do next. This step-by-step reasoning gives the model a chance to think deeper about which tools to utilize, instead of forcing the model to predict all tools to use at once!
Getting started is straightforward: define your functions, describe their inputs and outputs, and pass them to Claude using the tools parameter. If you want more transparency and advanced reasoning, turn on interleaved thinking using the provided beta header. Claude will then alternate between thinking blocks, tool calls, and results, building up a solution with clarity at every step.
To show how interleaved thinking works, I'll share an example using both modes, and you can see how both responses look like inside Weave: 
﻿
import anthropic
import ast
import os
import weave; weave.init("claude_4")
﻿
REVENUE_PER_MONTH = [5000, 6000, 4800, 5300, 5500, 5100, 4950, 5400, 5200, 5100, 5300, 5000]
﻿
def handle_db_command(command):
    cmd = command.lower().strip()
    if cmd == "get average monthly revenue":
        avg = sum(REVENUE_PER_MONTH) / len(REVENUE_PER_MONTH)
        return f"{avg:.2f}"
    elif cmd == "get total yearly revenue":
        return str(sum(REVENUE_PER_MONTH))
    elif cmd == "get best month":
        month = REVENUE_PER_MONTH.index(max(REVENUE_PER_MONTH)) + 1
        val = max(REVENUE_PER_MONTH)
        return f"Month {month}, revenue: {val}"
    else:
        return "Error: unknown database command"
﻿
def safe_eval(expr):
    try:
        node = ast.parse(expr, mode='eval')
        for sub in ast.walk(node):
            if not isinstance(sub, (ast.Expression, ast.BinOp, ast.UnaryOp, ast.Num, ast.Load, ast.operator, ast.unaryop, ast.Constant)):
                raise ValueError("Unsafe")
        return eval(expr, {"__builtins__": {}})
    except Exception as e:
        return f"Error: {e}"
﻿
calculator_tool = {
    "name": "calculator",
    "description": "Safely evaluates arithmetic expressions (e.g. '150 * 50')",
    "input_schema": {
        "type": "object",
        "properties": {
            "expression": {"type": "string", "description": "Math expression"}
        },
        "required": ["expression"]
    }
}
database_tool = {
    "name": "database",
    "description": (
        "A company revenue database. Supported commands (use as the `command` field):\n"
        "- 'get average monthly revenue' (Returns yearly average)\n"
        "- 'get total yearly revenue'\n"
        "- 'get best month' (Returns best month and its value)\n"
        "Do NOT use SQL. Only use one of the 3 supported commands."
    ),
    "input_schema": {
        "type": "object",
        "properties": {
            "command": {"type": "string", "description": "Plain database command"}
        },
        "required": ["command"]
    }
}
﻿
PROMPT = (
    "What's the total revenue if we sold 150 units of product A at $50 each?\n"
    "Also, compare this to our average monthly revenue in our database.\n\n"
    "Instructions for the database tool:\n"
    "- Only use one of these commands for the 'command' field:\n"
    "    * get average monthly revenue\n"
    "    * get total yearly revenue\n"
    "    * get best month\n"
    "Do not use SQL. Only use one of these simple command phrases."
)
﻿
client = anthropic.Anthropic()
﻿
def execute_tools(blocks):
    """
    Find all tool calls in blocks.
    Return tool_result dicts to send as messages back to Claude.
    """
    tool_results = []
    for b in blocks:
        if b.type == "tool_use":
            if b.name == "calculator":
                val = safe_eval(b.input['expression'])
                out = str(val)
            elif b.name == "database":
                out = handle_db_command(b.input['command'])
            else:
                out = "Error: unknown tool"
            tool_results.append({"type": "tool_result", "tool_use_id": b.id, "content": out})
    return tool_results
﻿
# === ONE MODEL CALL (one assistant reply, may be tool call or final text) ===
@weave.op()
def claude_message_op(msg_history, interleaved=False):
    """
    Runs a single Anthropic API call (one assistant step).
    Returns list of blocks (tool calls or text/thinking blocks).
    """
    extra_headers = (
        {"anthropic-beta": "interleaved-thinking-2025-05-14"} if interleaved else None
    )
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2512,
        tools=[calculator_tool, database_tool],
        thinking={"type": "enabled", "budget_tokens": 2000},
        extra_headers=extra_headers,
        messages=msg_history,
    )
    return response.content
﻿
# === MAIN CHAIN ===
@weave.op()
def run_claude_tools_full_chain(
    prompt_text=PROMPT, interleaved=False, max_loops=5):
    """
    Main loop: calls single-step Claude; if tool call, executes it and calls Claude again etc.
    Each step is a tracked @weave.op for graph lineage.
    Returns the full set of blocks and tool call/results for inspection in Weave.
    """
    all_steps = []
    msg_history = [{"role": "user", "content": prompt_text}]
    for loop in range(max_loops):
        assistant_blocks = claude_message_op(msg_history, interleaved=interleaved)
        all_steps.append({"assistant_blocks": assistant_blocks, "msg_history": list(msg_history)})
        tool_results = execute_tools(assistant_blocks)
        if not tool_results:
            break
        msg_history.append({"role": "assistant", "content": assistant_blocks})
        msg_history.append({"role": "user", "content": tool_results})
        all_steps[-1]["tool_results"] = tool_results
    return all_steps
﻿
def print_blocks(blocks):
    for block in blocks:
        if block.type == "thinking":
            print(f"[THINKING BLOCK]: {block.thinking}\n{'-'*60}")
        elif block.type == "tool_use":
            print(f"[TOOL USE BLOCK]: {block.name} with input {block.input} (id={block.id}) {'-'*40}")
        elif block.type == "text":
            print(f"[RESPONSE BLOCK]:\n{block.text}\n{'-'*60}")
﻿
﻿
﻿
# interleaved = True 
﻿
steps = run_claude_tools_full_chain(
    prompt_text=PROMPT,
    interleaved=True,
    max_loops=5
)
﻿
print("\n====== Full Conversation Step Trace ======")
for i, step in enumerate(steps):
    print(f"\n----- Assistant Step {i+1} -----")
    print_blocks(step["assistant_blocks"])
    if "tool_results" in step:
        print(f"\nTOOL RESULTS: {step['tool_results']}")
﻿
# Optionally, the "final" response blocks (which is the last 'assistant_blocks' returned)
final_blocks = steps[-1]["assistant_blocks"]
print("\n--- Final Assistant Response ---\n")
print_blocks(final_blocks)
﻿
﻿
# interleaved = False 
﻿
steps = run_claude_tools_full_chain(
    prompt_text=PROMPT,
    interleaved=False,
    max_loops=5
)
﻿
print("\n====== Full Conversation Step Trace ======")
for i, step in enumerate(steps):
    print(f"\n----- Assistant Step {i+1} -----")
    print_blocks(step["assistant_blocks"])
    if "tool_results" in step:
        print(f"\nTOOL RESULTS: {step['tool_results']}")
﻿
# Optionally, the "final" response blocks (which is the last 'assistant_blocks' returned)
final_blocks = steps[-1]["assistant_blocks"]
print("\n--- Final Assistant Response ---\n")
print_blocks(final_blocks)
When running with interleaved=True, Claude alternates between its own internal reasoning, tool calls, tool results, and final textual responses. Instead of planning out all tool calls in a single response, the model solves the process in a step-by-step fashion. This approach not only increases transparency, it also allows Claude to reflect on and incorporate the results of tool calls as it continues reasoning, providing a clear, step-by-step window into how the answer is assembled.
The Python code manages this orchestration by first registering a pair of custom tools: a calculator, implemented via safe_eval(), and a simulated database query interface, handled by handle_db_command(). Each tool is described not only by its function but also by an explicit input schema, ensuring that Claude knows what kind of arguments it can provide when calling these functions.
After running the script, you'll see the results inside Weave: 
﻿
﻿
Inside Weave, you can visualize both the high-level call to run_claude_tools_full_chain—such as prompt, tool usage, and assistant decisions—as well as every individual reasoning ('thinking'), tool call, and textual response block generated by Claude. This gives you a complete semantic trace and lets you debug, compare, and audit conversations at every step.
ConclusionIn just a few steps, you’ve set up Claude 4 with the Anthropic Python SDK and integrated it with W&B Weave for full visibility into every aspect of your model’s reasoning and tool use. This workflow lets you trace every interaction, tool call, and intermediate step - making it easy to debug, optimize, and understand exactly how Claude arrives at its answers.
With this foundation, you can reliably build, test, and monitor complex AI workflows powered by tool-using large language models. Continue developing, analyzing, and refining your pipelines as your projects evolve. For further tips and advanced usage, check out the materials below.
Related Reading
Attribute-Value Extraction With GPT-3 and Weights & Biases
In this article, we learn how to fine-tune OpenAI's GPT-3 for attribute-value extraction from products, looking at the challenges and how to overcome them.
Automating Change Log Tweets with Few-Shot Learning and GPT-3
A text summarization recipe using OpenAI GPT-3's few-shot learning and Weights & Biases
o1 model Python quickstart using the OpenAI API
Getting set up and running the new o1 models in Python using the OpenAI API. We'll be working with o1-preview.
﻿
﻿
Add a comment
Tags: Articles, Weave, Large Models, Tutorial
Iterate on AI agents and models faster. Try Weights & Biases today.