Skip to main content

Pypestream Weave Onboarding Guide

Created on May 8|Last edited on May 8
Access Weave by Weights & Biases (W&B) here: https://app.wandb.ai/login
To obtain access to a specific team, please contact Elana Feldman (efeldman@pypestream.com)
💡


About Weave 💫

Weave is an LLMOps platform by Weights & Biases (W&B) LLMOps built to facilitate collaboration and reproducibility across the machine learning development lifecycle. Machine learning projects can quickly become a mess without some best practices in place to aid developers and scientists as they iterate on models and move them to production.
Weave is lightweight enough to work with whatever framework or platform teams are currently using, but enables teams to quickly start logging their important results to a central system of record. W&B also offers built-in visualization, automation, and documentation capabilities for better debugging, model tuning, and project management.


Getting Access to Weave

If you do not already have a W&B account, please ask an admin to send you an invite.
Our suggestion is to create a team that you can then log projects to. This will need to be done by an admin.

Environment Variables

It will make things easier to set the following environment variables. This will save you having to specify the host and key as detailed further down in this document.
os.environ["WANDB_BASE_URL"] = "https://api.wandb.ai"
os.environ["WANDB_API_KEY"] = "your-wandb-api-key"
Your API key is available at https://wandb.ai/authorize

W&B Installation & Authentication

To start using W&B Weave, you first need to install the Python package (if not already installed)
pip install weave
Once installed, authenticate your W&B Weave user account by logging in through the CLI or SDK. You should have received an email to sign up to the platform, after which you can obtain your API token (The API token is in your "Settings" section under your profile)
wandb login --host <YOUR W&B HOST URL> <YOUR API TOKEN>
OR through Python:
wandb.login(host=os.getenv("WANDB_BASE_URL"), key=os.getenv("WANDB_API_KEY"))
In headless environments, you can instead define the WANDB_API_KEY environment variable.
Once you are logged in, you are ready to track your workflows!

Track and evaluate GenAI applications via W&B Weave



Weave is a lightweight toolkit for tracking and evaluating GenAI applications
The goal is to bring rigor, best-practices, and composability to the inherently experimental process of developing GenAI applications, without introducing cognitive overhead.

Weave can be used to:
  • Log and debug model inputs, outputs, and traces
  • Build rigorous, apples-to-apples evaluations for language model use cases
  • Capture valuable feedback that can be used to build new training and evaluation sets
  • Organize all the information generated across the LLM workflow, from experimentation to evaluations to production
A quick-start guide to weave can be found here.

Get started with W&B Weave - Basic Tracing

Once you have authenticated with W&B, you can start by creating a Weave project with the following command
import weave
weave.init('<entity-name>/<project-name>')
# this ensures the project is created in the relevant team.
# If you have WANDB_ENTITY and WANDB_PROJECT set as env vars then you won't need to specify these.
# You may need to create the team.
Now you can decorate the functions you want to track by adding this one line decorator weave.op() to your functions.
Here's what an example script would look like (feel free to copy/paste this in your IDE and run this script)
import weave
from openai import OpenAI

client = OpenAI()

# Weave will track the inputs, outputs and code of this function
@weave.op()
def extract_dinos(sentence: str) -> dict:
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": """In JSON format extract a list of `dinosaurs`, with their `name`,
their `common_name`, and whether its `diet` is a herbivore or carnivore"""
},
{
"role": "user",
"content": sentence
}
],
response_format={ "type": "json_object" }
)
return response.choices[0].message.content


# Initialize the weave project
weave.init('<team-name>/jurassic-park')

sentence = """I watched as a Tyrannosaurus rex (T. rex) chased after a Triceratops (Trike), \
both carnivore and herbivore locked in an ancient dance. Meanwhile, a gentle giant \
Brachiosaurus (Brachi) calmly munched on treetops, blissfully unaware of the chaos below."""

result = extract_dinos(sentence)
print(result)

Integrations

Weave provides automatic logging integrations for popular LLM providers and orchestration frameworks. These integrations allow you to seamlessly trace calls made through various libraries, enhancing your ability to monitor and analyze your AI applications, even without explicitly using the weave.op() decorator. We integrate with the following LLM Providers & Frameworks:
  • Amazon Bedrock
  • Anthropic
  • Cerebras
  • Cohere
  • Google
  • LiteLLM
  • MistralAI
  • OpenAI
  • OpenAI Agents SDK
  • LangChain
  • LlamaIndex
And more! Read the full list here

Example Integration Usage

Here, we're automatically tracking all calls to OpenAI.
from openai import OpenAI

import weave

weave.init(PROJECT)

client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "system",
"content": "You are a grammar checker, correct the following user input.",
},
{"role": "user", "content": "That was so easy, it was a piece of pie!"},
],
temperature=0,
)
generation = response.choices[0].message.content
print(generation)


Spinning up an Evaluation

Evaluation-driven development helps you reliably iterate on an application. The Evaluation class is designed to assess the performance of a Model on a given Dataset or set of examples using scoring functions.
See a preview of the API below:
import weave
from weave import Evaluation

# Define any custom scoring function
@weave.op()
def exact_match(expected: str, output: dict) -> dict:
# Here is where you'd define the logic to score the model output
return {"match": expected == output}


# Score your examples using scoring functions
evaluation = Evaluation(
dataset=dataset, # can be a list of dictionaries or a weave.Dataset object
scorers=[exact_match], # can be a list of scoring functions
)

# Start tracking the evaluation
weave.init(PROJECT)
# Run the evaluation
print(asyncio.run(evaluation.evaluate(corrector)))

# if you're in a Jupyter Notebook, run:
# await evaluation.evaluate(corrector)
Follow the Build an Evaluation pipeline tutorial to learn more about Evaluation and begin iteratively improving your applications.


Tracking Objects

Organizing experimentation is difficult when there are many moving pieces. You can capture and organize the experimental details of your app like your system prompt or the model you're using within weave. Objects. This helps organize and compare different iterations of your app. In this section, we will cover the following, and you can view additional documentation here

Prompt Tracking

Creating, evaluating, and refining prompts is a core activity for AI engineers. Small changes to a prompt can have big impacts on your application's behavior. Weave lets you create prompts, save and retrieve them, and evolve them over time.
Weave is unopinionated about how a Prompt is constructed. If your needs are simple you can use our built-in weave.StringPrompt or weave. MessagesPrompt classes. If your needs are more complex you can subclass those or our base class weave.Prompt and override the format method.
When you publish one of these objects with weave.publish, it will appear in your Weave project on the "Prompts" page.

StringPrompt

import weave
weave.init('intro-example')

system_prompt = weave.StringPrompt("You are a pirate")
weave.publish(system_prompt, name="pirate_prompt")

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": system_prompt.format()
},
{
"role": "user",
"content": "Explain general relativity in one paragraph."
}
],
)
Perhaps this prompt does not yield the desired effect, so we modify the prompt to be more clearly instructive.

import weave
weave.init('intro-example')

system_prompt = weave.StringPrompt("Talk like a pirate. I need to know I'm listening to a pirate.")
weave.publish(system_prompt, name="pirate_prompt")

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": system_prompt.format()
},
{
"role": "user",
"content": "Explain general relativity in one paragraph."
}
],
)
When viewing this prompt object, I can see that it has two versions.


I can also select them for comparison to see exactly what changed.


MessagesPrompt

The MessagesPrompt can be used to replace an array of Message objects.
import weave
weave.init('intro-example')

prompt = weave.MessagesPrompt([
{
"role": "system",
"content": "You are a stegosaurus, but don't be too obvious about it."
},
{
"role": "user",
"content": "What's good to eat around here?"
}
])
weave.publish(prompt, name="dino_prompt")

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
model="gpt-4o",
messages=prompt.format(),
)

Parameterizing prompts

As the format method's name suggests, you can pass arguments to fill in template placeholders in the content string.
import weave
weave.init('intro-example')

prompt = weave.StringPrompt("Solve the equation {equation}")
weave.publish(prompt, name="calculator_prompt")

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": prompt.format(equation="1 + 1 = ?")
}
],
)

Tracking Models



Models are so common of an object type, that we have a special class to represent them: weave.Model. The only requirement is that we define a predict method.
from openai import OpenAI

import weave

weave.init(PROJECT)


class OpenAIGrammarCorrector(weave.Model):
# Properties are entirely user-defined
openai_model_name: str
system_message: str

@weave.op()
def predict(self, user_input):
client = OpenAI()
response = client.chat.completions.create(
model=self.openai_model_name,
messages=[
{"role": "system", "content": self.system_message},
{"role": "user", "content": user_input},
],
temperature=0,
)
return response.choices[0].message.content


corrector = OpenAIGrammarCorrector(
openai_model_name="gpt-4o-mini",
system_message="You are a grammar checker, correct the following user input.",
)

result = corrector.predict("That was so easy, it was a piece of pie!")
print(result)


Tracking Datasets

Retrieve Published Objects & Ops

You can publish objects and then retrieve them in your code. You can even call functions from your retrieved objects!


import weave

weave.init(PROJECT)

# Note: this url is available from the UI after publishing the object!
ref_url = f"weave:///{ref.entity}/{PROJECT}/object/{ref.name}:{ref.digest}"
fetched_collector = weave.ref(ref_url).get()

# Notice: this object was loaded from remote location!
result = fetched_collector.predict("That was so easy, it was a piece of pie!")

print(result)

Adding Programmatic Feedback

Weave provides an integrated feedback system, allowing users to provide call feedback directly through the UI or programmatically via the SDK. Various feedback types are supported, including emoji reactions, textual comments, and structured data, enabling teams to:
  • Build evaluation datasets for performance monitoring.
  • Identify and resolve LLM content issues effectively.
  • Gather examples for advanced tasks like fine-tuning.
You can find SDK usage examples for feedback in the UI under the Use tab in the call details page.

Provide feedback via the UI

In the Weave UI, you can add and view feedback from the call details page or using the icons.
You can add or remove a reaction, and add a note using the icons that are located in both the call table and individual call details pages.
  • Call table: Located in Feedback column in the appropriate row in the call table.
  • Call details page: Located in the upper right corner of each call details page.
To add a reaction:
  1. Click the emoji icon.
2. Add a thumbs up, thumbs down, or click the + icon for more emojis.


Provide feedback via the SDK

You can use the Weave SDK to programmatically add, remove, and query feedback on calls.
Query a project's feedback
You can query the feedback for your Weave project using the SDK. The SDK supports the following feedback query operations:
  • client.get_feedback(): Returns all feedback in a project.
  • client.get_feedback("<feedback_uuid>"): Return a specific feedback object specified by <feedback_uuid> as a collection.
  • client.get_feedback(reaction="<reaction_type>"): Returns all feedback objects for a specific reaction type.
You can also get additional information for each feedback object in client.get_feedback():
id: The feedback object ID.
created_at: The creation time information for the feedback object.
feedback_type: The type of feedback (reaction, note, custom).
payload: The feedback payload
import weave
client = weave.init('intro-example')

# Get all feedback in a project
all_feedback = client.get_feedback()

# Fetch a specific feedback object by id.
# The API returns a collection, which is expected to contain at most one item.
one_feedback = client.get_feedback("<feedback_uuid>")[0]

# Find all feedback objects with a specific reaction. You can specify offset and limit.
thumbs_up = client.get_feedback(reaction="👍", limit=10)

# After retrieval, view the details of individual feedback objects.
for f in client.get_feedback():
print(f.id)
print(f.created_at)
print(f.feedback_type)
print(f.payload)

Add Feedback to a Call

You can add feedback to a call using the call's UUID. To use the UUID to get a particular call, retrieve it during or after call execution. The SDK supports the following operations for adding feedback to a call:
call.feedback.add_reaction("<reaction_type>"): Add one of the supported <reaction_types> (emojis), such as 👍.
call.feedback.add_note("<note>"): Add a note.
call.feedback.add("<label>", <object>): Add a custom feedback <object> specified by <label>.
import weave
client = weave.init('intro-example')

call = client.get_call("<call_uuid>")

# Adding an emoji reaction
call.feedback.add_reaction("👍")

# Adding a note
call.feedback.add_note("this is a note")

# Adding custom key/value pairs.
# The first argument is a user-defined "type" string.
# Feedback must be JSON serializable and less than 1 KB when serialized.
call.feedback.add("correctness", { "value": 5 })

Add human annotations

Human annotations are supported in the Weave UI. To make human annotations, you must first create a Human Annotation scorer using either the UI or the API. Then, you can use the scorer in the UI to make annotations, and modify your annotation scorers using the API.

Create a human annotation scorer in the UI

To create a human annotation scorer in the UI, do the following:
  1. In the sidebar, navigate to Scorers.
2. In the upper right corner, click + Create scorer.
3. In the configuration page, set:
  • Scorer type to Human annotation
  • Name
  • Description
  • Type, which determines the type of feedback that will be collected, such as boolean or integer.
4. Click Create scorer. Now, you can use your scorer to make annotations.
In the following example, a human annotator is asked to select which type of document the LLM ingested. As such, the Type selected for the score configuration is an enum containing the possible document types



Create a human annotation scorer using the API

Human annotation scorers can also be created through the API. Each scorer is its own object, which is created and updated independently. To create a human annotation scorer programmatically, do the following:
  1. Import the AnnotationSpec class from weave.flow.annotation_spec
2. Use the publish method from weave to create the scorer.
In the following example, two scorers are created. The first scorer, Temperature, is used to score the perceived temperature of the LLM call. The second scorer, Tone, is used to score the tone of the LLM response. Each scorer is created using save with an associated object ID (temperature-scorer and tone-scorer).
import weave
from weave.flow.annotation_spec import AnnotationSpec

client = weave.init("feedback-example")

spec1 = AnnotationSpec(
name="Temperature",
description="The perceived temperature of the llm call",
field_schema={
"type": "number",
"minimum": -1,
"maximum": 1,
}
)
spec2 = AnnotationSpec(
name="Tone",
description="The tone of the llm response",
field_schema={
"type": "string",
"enum": ["Aggressive", "Neutral", "Polite", "N/A"],
},
)
weave.publish(spec1, "temperature-scorer")
weave.publish(spec2, "tone-scorer")

Use a human annotation scorer using the API

The feedback API allows you to use a human annotation scorer by specifying a specially constructed name and an annotation_ref field. You can obtain the annotation_spec_ref from the UI by selecting the appropriate tab, or during the creation of the AnnotationSpec.
import weave

client = weave.init("feedback-example")

call = client.get_call("<call_id>")
annotation_spec = weave.ref("<annotation_spec_ref_uri>")

call.feedback.add(
feedback_type="wandb.annotation." + annotation_spec.name,
payload={"value": 1},
annotation_ref=annotation_spec.uri(),
)
Additional documentation here

FAQs

The following page provides answers to common questions about Weave tracing.

What information does Weave capture for a function?

How can I disable code capture?

How can I disable system information capture?

How can I disable client information capture?

Will Weave affect my function's execution speed?

How do I render Markdown in the UI?

Wrap your string with weave.Markdown(...) before saving, and use weave.publish(...) to store it. Weave uses the object’s type to determine rendering, and weave.Markdown maps to a known UI renderer. The value will be shown as a formatted Markdown object in the UI.