Adding observability and tracing to your Bedrock AgentCore Agents

A practical guide to running production AI agents on AWS AgentCore and wiring them to Weave for step-by-step observability, with examples showing setup, instrumentation, and trace-driven debugging.
Brett Young
Created on August 11|Last edited on August 13
Comment
﻿AgentCore is AWS’s framework for building, running, and scaling AI agents in production. It provides the runtime environment, identity and memory services, and built-in tools like the Browser, Code Interpreter, and Gateway so developers can focus on agent logic without managing infrastructure. It also supports multiple agent frameworks and models, offers secure isolated execution, and integrates with AWS services for authentication, data access, and deployment.
﻿Weave, from Weights & Biases, is a monitoring and observability platform for AI applications. It records detailed execution traces, capturing every function call (inputs, outputs, metadata, exceptions) and linking them together into a navigable tree. It works out of the box with many LLM libraries and can also track custom application logic via decorators. By combining these traces with metrics like latency, token counts, and cost, Weave helps developers debug, optimize, and evaluate AI systems.
When used in tandem, AgentCore provides the environment for running production agents, while Weave provides the lens to observe exactly how those agents behave. This means you can see every reasoning step, every tool call, and every API interaction your agent performs inside AgentCore, all captured in Weave’s trace view. For teams building complex agents that may call multiple tools, chain reasoning steps, or interact with external APIs, this pairing gives both operational reliability and development visibility.
In the rest of this article, we’ll walk through how to integrate Weave into an AgentCore agent runtime. You’ll see how to initialize Weave inside your agent, instrument both built-in tools and custom logic, and send execution traces so you can monitor and debug your AgentCore-powered agents with the same granularity you’d expect from any other modern AI application.
﻿
What is Bedrock AgentCore? Why agent observability is importantGetting started with AgentCoreConclusion 
﻿
What is Bedrock AgentCore? AgentCore is an AWS platform designed to host and operate AI agents in a secure, scalable, and flexible way. It is not a single monolithic service but a collection of components that you can use together or individually, depending on your needs. At its core is the AgentCore Runtime, a managed execution environment that runs agents or tools in isolated containers. This isolation ensures that each agent session is secure, while also enabling long running tasks up to eight hours without worrying about underlying server management.
AgentCore supports multiple agent frameworks such as LangGraph, CrewAI, and Strands Agents, and it works with any large language model you choose. The runtime can handle large payloads, multi modal inputs, and integrates seamlessly with AWS authentication so agents can securely interact with AWS resources and external APIs.
Beyond the runtime, AgentCore includes services for managing identity, memory, and observability. The identity service controls authentication and authorization, making it straightforward to enforce least privilege access. The memory service gives agents short term and long term storage for context, allowing them to carry state across conversations or share knowledge across agents.
It also ships with built in tools like the Browser tool for retrieving data from the web, the Code Interpreter for executing code in isolated environments, and the Gateway for turning APIs into agent callable functions. Observability features let you trace and debug agents, integrate with monitoring systems, and track performance over time. In short, AgentCore gives you the infrastructure and services to move an AI agent from prototype to production without having to build and maintain the supporting systems yourself.
Why agent observability is importantObservability is the ability to understand exactly what is happening inside a system by examining the data it produces. For AI agents, this means having visibility into every step the agent takes, every decision it makes, and every interaction it has with external systems. Without observability, issues can remain hidden until they cause failures that are difficult to diagnose and fix. In an AgentCore environment, agents can be complex, chaining reasoning steps, calling multiple tools, processing data from different sources, and interacting with various APIs. If something goes wrong, you need a clear record of what the agent did and why it did it, and observability provides that record by capturing inputs, outputs, timings, errors, and other metadata.
This visibility makes debugging more efficient and reduces the time it takes to find and fix problems, but it is also valuable for performance and cost optimization. By monitoring how agents use tools, how long steps take, and how often certain patterns occur, you can identify opportunities to streamline workflows or reduce unnecessary calls to expensive models. For production systems, observability is also a foundation for trust. If your agents are making decisions that affect users or important business processes, you need a transparent view of their behavior, and being able to trace their execution helps ensure they are working as intended and meeting compliance or governance requirements.
With AgentCore’s built in observability features and external tools like Weave, you can combine runtime level metrics with rich execution traces to create a complete picture of your agents’ behavior, enabling both rapid troubleshooting and continuous improvement.
Getting started with AgentCoreIn this section we build a basic AgentCore agent using the Strands framework and a couple of basic tools, then deploy it to AgentCore. This gives you a minimal but functional starting point for running agents in AWS’s managed environment. For a step by step guide on configuring AWS and deploying your agent to AgentCore, check out the tutorial here.
First we create the agent without any observability enabled so you can see the baseline setup:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import Dict, Any
from datetime import datetime
﻿
from strands import Agent, tool
from strands.models import BedrockModel
﻿
### --- Define your tools
@tool
def word_count(text: str) -> int:
    """Returns the number of words in the text."""
    return len(text.split())
﻿
@tool
def reverse(text: str) -> str:
    """Reverses the input string."""
    return text[::-1]
﻿
### --- Configure your Bedrock model provider
bedrock_model = BedrockModel(
    model_id="anthropic.claude-3-5-sonnet-20240620-v1:0",
    temperature=0.3,
    streaming=False,  # Set to True if you want streaming output
    region_name="us-east-1",  # <--- your AWS region
)
﻿
### --- Initialize the agent WITH tools
strands_agent = Agent(
    model=bedrock_model,
    tools=[word_count, reverse],
    system_prompt="You are a helpful assistant who uses tools when they help."
)
﻿
### --- FastAPI app setup
app = FastAPI(
    title="Strands Agent Server",
    version="1.0.0"
)
﻿
class InvocationRequest(BaseModel):
    input: Dict[str, Any]
﻿
class InvocationResponse(BaseModel):
    output: Dict[str, Any]
﻿
@app.post("/invocations", response_model=InvocationResponse)
async def invoke_agent(request: InvocationRequest):
    try:
        user_message = request.input.get("prompt", "")
        if not user_message:
            raise HTTPException(
                status_code=400,
                detail="No prompt found in input. Please provide a 'prompt' key in the input."
            )
        # Call the strands agent synchronously
        result = strands_agent(user_message)
﻿
        response = {
            "message": result.message,  # agent reply object
            "timestamp": datetime.utcnow().isoformat(),
            "model": bedrock_model.config["model_id"],
        }
        return InvocationResponse(output=response)
    except Exception as e:
        raise HTTPException(status_code=500, detail=f"Agent processing failed: {str(e)}")
﻿
@app.get("/ping")
async def ping():
    return {"status": "healthy"}
﻿
if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8080)
Overall, This code defines an API that wraps an AI agent with two tools. The tools are word_count, which returns the number of words in a string, and reverse, which flips a string. It configures an Anthropic Claude 3.5 Sonnet model on Bedrock, then builds a Strands Agent that can call those tools. A FastAPI server exposes two endpoints. POST /invocations reads request.input.prompt, runs the agent, and returns the agent’s reply, a timestamp, and the model id. It raises 400 when the prompt is missing and 500 on other errors. GET /ping returns a simple health check. The main block runs uvicorn.
Now we will add observability to our agent. AWS provides integrated options for monitoring agents, but I prefer Weave for its detailed trace capture, its interface, and the overall developer experience. The dashboard is polished, customizable, and purpose built for exploring traces, comparing runs, and drilling into details quickly. 
It works out of the box with LLM providers like Anthropic and OpenAI using the Weave SDK, and the workflow in local development is identical to what you get when deployed within Agent Runtime, which makes it easy to move from development to production. It automatically captures inputs, outputs, metadata, token usage, and exceptions, and logs them to a backend agnostic dashboard, whereas AWS logs are primarily viewed inside the CloudWatch Console.
When deployed in Agent Runtime, Weave gives you flexibility. You can use the native Weave SDK with @weave.op and its UI, or integrate through OpenTelemetry, which AgentCore already uses under the hood. This dual option means you can start with Weave’s own dashboard and later decide if you want to blend it with other OpenTelemetry based observability systems. It also avoids heavy vendor lock in compared to AWS only tooling, giving you a more portable and user friendly way to monitor GenAI applications. Do not just take my word for it; one excellent way to know for sure which solution works best is to deploy both options in your environment, run them for a while, and compare the experience before committing to one for the long term.
There are a few different ways to add Weave to your AgentCore agents. The most direct approach is to decorate the functions or methods you want to track with @weave.op, initialize Weave in your agent code, and make sure the weave Python package is available in your runtime. AgentCore is built on Docker, so if you are deploying as a container you simply add uv add weave so the weave package gets added to the uv.lock to your Dockerfile to ensure Weave is installed in the image before deployment.
import os
import json
from datetime import datetime
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import Dict, Any
﻿
import weave
﻿
from strands import Agent, tool
from strands.models import BedrockModel
﻿
# Export WANDB key for Weave auth
os.environ["WANDB_API_KEY"] = "your_api_key"
﻿
WEAVE_PROJECT = os.getenv("WEAVE_PROJECT", "your_wandb_username/wand_project_name")
weave.init(WEAVE_PROJECT)
﻿
@weave.op()
def word_count_op(text: str) -> int:
    return len(text.split())
﻿
@weave.op()
def reverse_op(text: str) -> str:
    return text[::-1]
﻿
def get_agent():
    @tool
    def word_count(text: str) -> int:
        return word_count_op(text)
﻿
    @tool
    def reverse(text: str) -> str:
        return reverse_op(text)
﻿
    bedrock_model = BedrockModel(
        model_id="anthropic.claude-3-5-sonnet-20240620-v1:0",
        temperature=0.3,
        streaming=False,
        region_name="us-east-1",
    )
﻿
    return Agent(
        model=bedrock_model,
        tools=[word_count, reverse],
        system_prompt="You are a helpful assistant who uses tools when they help.",
    )
﻿
@weave.op()
def run_agent(agent: Agent, user_message: str) -> Dict[str, Any]:
    result = agent(user_message)
    return {
        "message": result.message,
        "model": agent.model.config["model_id"],
    }
﻿
app = FastAPI()
﻿
class InvocationRequest(BaseModel):
    input: Dict[str, Any]
﻿
class InvocationResponse(BaseModel):
    output: Dict[str, Any]
﻿
@app.post("/invocations", response_model=InvocationResponse)
async def invoke_agent(body: InvocationRequest):
    agent = get_agent()
    user_message = body.input.get("prompt", "")
    if not user_message:
        raise HTTPException(400, "No prompt found; provide 'prompt' key.")
﻿
    result = run_agent(agent, user_message)
﻿
    response = {
        "message": result["message"],
        "timestamp": datetime.utcnow().isoformat(),
        "model": result["model"],
    }
    return InvocationResponse(output=response)
﻿
@app.get("/ping")
async def ping():
    return {"status": "healthy"}
Here Weave is initialized with weave.init, and functions are decorated with @weave.op to automatically capture calls. All inputs, outputs, and tool calls are logged in Weave, so you can see exactly what happened in each run without creating spans manually. This approach is simpler to instrument and gives you richer, structured traces out of the box.
Weave can also work with OpenTelemetry, which is a standard for sending traces and metrics from your application to different backends. This lets you connect Weave’s detailed execution traces with other observability systems or dashboards you already use.
import functools
import base64
import os
import json
from datetime import datetime
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import Dict, Any
﻿
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
﻿
from strands import Agent, tool
from strands.models import BedrockModel
﻿
# ─── Weave OTLP setup ────────────────────────────────────────────────────────────
WANDB_API_KEY = os.getenv("WANDB_API_KEY", "your_api_key")
WEAVE_PROJECT = "your_wandb_username/wand_project_name"
﻿
auth_b64 = base64.b64encode(f"api:{WANDB_API_KEY}".encode()).decode()
exporter = OTLPSpanExporter(
    endpoint="https://trace.wandb.ai/otel/v1/traces",
    headers={"Authorization": f"Basic {auth_b64}", "project_id": WEAVE_PROJECT},
)
provider = TracerProvider()
provider.add_span_processor(SimpleSpanProcessor(exporter))
trace.set_tracer_provider(provider)
tracer = trace.get_tracer("strands-agent")
# ────────────────────────────────────────────────────────────────────────────────
﻿
def tool_logger(tool_calls):
    def decorator(fn):
        @functools.wraps(fn)
        def wrapper(*args, **kwargs):
            result = fn(*args, **kwargs)
            tool_calls.append({
                "tool_name": fn.__name__,
                "tool_input": {"args": args, "kwargs": kwargs},
                "tool_output": result,
            })
            return result
        return wrapper
    return decorator
﻿
def get_agent(tool_calls):
    @tool
    def word_count(text: str) -> int:
        return tool_logger(tool_calls)(lambda t: len(t.split()))(text)
﻿
    @tool
    def reverse(text: str) -> str:
        return tool_logger(tool_calls)(lambda t: t[::-1])(text)
﻿
    bedrock_model = BedrockModel(
        model_id="anthropic.claude-3-5-sonnet-20240620-v1:0",
        temperature=0.3,
        streaming=False,
        region_name="us-east-1",
    )
﻿
    return Agent(
        model=bedrock_model,
        tools=[word_count, reverse],
        system_prompt="You are a helpful assistant who uses tools when they help.",
    )
﻿
app = FastAPI()
﻿
class InvocationRequest(BaseModel):
    input: Dict[str, Any]
﻿
class InvocationResponse(BaseModel):
    output: Dict[str, Any]
﻿
@app.post("/invocations", response_model=InvocationResponse)
async def invoke_agent(body: InvocationRequest):
    tool_calls = []
    agent = get_agent(tool_calls)
    user_message = body.input.get("prompt", "")
    if not user_message:
        raise HTTPException(400, "No prompt found; provide 'prompt' key.")
﻿
    with tracer.start_as_current_span("invoke_agent") as span:
        span.set_attribute("input.value", json.dumps({"prompt": user_message}))
﻿
        result = agent(user_message)
﻿
        span.set_attribute("output.value", json.dumps({
            "message": result.message,
            "model": agent.model.config["model_id"],
            "tool_calls": tool_calls,
        }))
﻿
    response = {
        "message": result.message,
        "timestamp": datetime.utcnow().isoformat(),
        "model": agent.model.config["model_id"],
        "tool_calls": tool_calls,
    }
    return InvocationResponse(output=response)
﻿
@app.get("/ping")
async def ping():
    return {"status": "healthy"}
﻿
﻿
This version sets up OpenTelemetry to export spans directly to Weave’s OTLP endpoint. The W&B API key is base64-encoded for the OTLPSpanExporter, and a tracer is created for tagging spans. Each tool is wrapped with tool_logger so its name, input, and output are stored in a list. Inside invoke_agent, a span records the request input and output so they are searchable in Weave. You get AgentCore compatibility while keeping logging backend-agnostic, meaning the agent runs normally inside AgentCore but the telemetry is not locked to AWS’s CloudWatch.
Once integrated, you can visualize your agents directly in Weave’s UI. Each call, tool invocation, and reasoning step appears in a tree view, letting you expand into details like inputs, outputs, latency, and token usage. You can click through to see exactly how your agent arrived at a decision or debug an unexpected output by following the chain of calls in real time or from past runs. This visibility makes it much easier to iterate on prompts, tune tool behavior, and understand the impact of changes to your agent’s logic.
Here are some screenshots inside Weave of the observability dashboard. 
﻿
Here we can see not only the inputs and outputs, but also the full breakdown of each intermediate step the agent took to produce its answer. This includes tool invocations, their arguments, and the exact outputs returned by each tool. Latency metrics for each call are visible, making it easier to identify slow components. Token usage per step is also available, which helps manage cost and optimize prompts. By combining these details, you gain an end-to-end picture of your agent’s execution flow, enabling precise debugging and performance tuning.
Conclusion Integrating Weave into your AgentCore workflow makes observability a natural part of running agents. Once traces appear in the dashboard, you can see exactly how an agent processes requests, calls tools, and produces results. Every step is recorded with inputs, outputs, and timing, so debugging becomes a focused process rather than a hunt through scattered logs. This visibility also makes tuning more precise, because you can trace the impact of each change and compare runs side by side.
Over time, these traces reveal patterns in how your agents operate. You might spot bottlenecks in tool execution, inconsistencies in outputs, or unnecessary calls that drive up costs. With Weave, you can drill into the exact moment an issue occurs and adjust your logic with confidence. Whether used alone or alongside AWS’s own monitoring, it gives you a clear and continuous view into your agents’ behavior, making it easier to keep them reliable while evolving their capabilities.
﻿
﻿
﻿
Add a comment
Tags: Articles, Evaluations, Framework / Integration, GenAI, Agents
Iterate on AI agents and models faster. Try Weights & Biases today.