Skip to main content

ZenBusiness Weave PoC Guide

One stop shop for everything you need to test out during the W&B Pilot.
Created on February 6|Last edited on February 19
Access Weights & Biases here: https://wandb.ai/trial-zenbusiness
💡
For Any Questions, please reach out via the Slack channel: #wandb-zenbusiness-trial
💡


Weights and Biases (W&B) 💫

Weights and Biases is a ML Ops platform built to facilitate collaboration and reproducibility across the machine learning development lifecycle. Machine learning projects can quickly become a mess without some best practices in place to aid developers and scientists as they iterate on models and move them to production.
W&B is lightweight enough to work with whatever framework or platform teams are currently using, but enables teams to quickly start logging their important results to a central system of record. On top of this system of record, W&B has built visualization, automation, and documentation capabilities for better debugging, model tuning, and project management.

PoC Workshop Sessions

DateSessionRecording Link
Jan 15, 2025W&B Weave Demohttps://us-39259.app.gong.io/e/c-share/?tkn=r1pszedo543n17lezknwk3rms


W&B Installation & Authentication

To start using W&B, you first need to install the Python package (if it's not already there)
pip install wandb weave
Once it's installed, authenticate your user account by logging in through the CLI or SDK. You should have receive an email to sign up to the platform, after which you can obtain your API token (The API token is in your "Settings" section under your profile)
wandb login --host <YOUR W&B HOST URL> <YOUR API TOKEN>
OR through Python:
wandb.login(host=os.getenv("WANDB_BASE_URL"), key=os.getenv("WANDB_API_KEY"))
In headless environments, you can instead define the WANDB_API_KEY environment variable.
Once you are logged in, you are ready to track your workflows!

Use Cases / Test Cases

S NoCapability & Success Criteria
1Automate Evaluation Testing & Analysis
2Model Evaluation & Prompt Management
3Visibility into Input/Output Calls for RAG Pipelines
4Visibility into Latency, Token Usage & Cost



Test Case 1: Automate Evaluation Testing & Analysis

To iterate on an application, we need a way to evaluate if it's improving. To do so, a common practice is to test it against the same set of examples when there is a change. Evaluation-driven development helps you reliably iterate on an application.
Weave has a first-class way to track evaluations with Model & Evaluation classes. The Evaluation class is designed to assess the performance of a Model on a given Dataset or set of examples using scoring functions. We have built the APIs to make minimal assumptions to allow for the flexibility to support a wide array of use-cases.
This doc walks through more details on building a Model-Based Evaluation of RAG applications with W&B Weave
Additionally Weave also has options to Compare Evaluations (for visualization of evaluation scores to track key metrics like accuracy, coherence, and hallucinations etc.) and Leaderboard capability (leaderboard cookbook) to compare model performance across different datasets and scoring functions.

Test Case 2: Model Evaluation & Prompt Management

Weave provides the ability to Log Models, Datasets, Prompts as a first class Objects.
Objects form Weave's extensible serialization layer, automatically versioning runtime objects (often the inputs and outputs of Calls). This feature allows you to:
  • Track changes in data structures over time
  • Maintain a clear history of object modifications
  • Easily revert to previous versions when needed
So with Weave, you can log "golden dataset" as Weave Datasets which becomes a centralized layer available for easy access to the team as well provides the ability to log Prompts allowing to create a Prompt Hub for storing, sharing, and versioning prompts.
Below is an example script to log a StringPrompt (this docs page has more details on logging Prompts to Weave):
import weave
weave.init('intro-example')

system_prompt = weave.StringPrompt("You are a pirate")
weave.publish(system_prompt, name="pirate_prompt")

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": system_prompt.format()
},
{
"role": "user",
"content": "Explain general relativity in one paragraph."
}
],
)
By leveraging these tracing capabilities, you can gain deeper insights into your application's behavior, streamline your development process, and build more robust AI-powered systems.

Test Case 3: Visibility into Input/Output Calls for RAG Pipelines

Weave provides powerful tracing capabilities to track and version objects and function calls in your applications. This comprehensive system enables better monitoring, debugging, and iterative development of AI-powered applications, allowing you to "track insights between commits."
This tutorial walks through an example to trace an application to track data flows and app metadata. Weave has a native integration with popular LLMs that make logging traces very easy.
This facilitates End-to-end visibility into input/output across AI pipelines.

Test Case 4: Visibility into Latency, Token Usage & Cost

Weave automatically calculates costs based on the number of tokens used and the model used. Weave grabs this usage and model from the output and associates them with the call.
But Weave also allows for users to add custom costs for their own models. This cookbook walks through the steps to setup a custom model cost.

Track and evaluate GenAI applications via W&B Weave



Weave is a lightweight toolkit for tracking and evaluating GenAI applications
The goal is to bring rigor, best-practices, and composability to the inherently experimental process of developing GenAI applications, without introducing cognitive overhead.


Weave can be used to:
  • Log and debug model inputs, outputs, and traces
  • Build rigorous, apples-to-apples evaluations for language model use cases
  • Capture valuable feedback that can be used to build new training and evaluation sets
  • Organize all the information generated across the LLM workflow, from experimentation to evaluations to production
A quick-start guide to weave can be found here.

Get started with W&B Weave - Quickstart guide

Once you have authenticated with W&B, you can start by creating a Weave project with the following command
import weave
weave.init('trial-zenbusiness/<project-name>') # this ensures the project is created in the trial-zenbusiness team that has been created for the PoV
Now you can decorate the functions you want to track by adding this one line decorator weave.op() to your functions.
Here's what an example script would look like (feel free to copy paste this in your IDE and run this script)
import weave
from openai import OpenAI

client = OpenAI()

# Weave will track the inputs, outputs and code of this function
@weave.op()
def extract_dinos(sentence: str) -> dict:
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": """In JSON format extract a list of `dinosaurs`, with their `name`,
their `common_name`, and whether its `diet` is a herbivore or carnivore"""
},
{
"role": "user",
"content": sentence
}
],
response_format={ "type": "json_object" }
)
return response.choices[0].message.content


# Initialise the weave project
weave.init('trial-zenbusiness/jurassic-park')

sentence = """I watched as a Tyrannosaurus rex (T. rex) chased after a Triceratops (Trike), \
both carnivore and herbivore locked in an ancient dance. Meanwhile, a gentle giant \
Brachiosaurus (Brachi) calmly munched on treetops, blissfully unaware of the chaos below."""

result = extract_dinos(sentence)
print(result)

FAQs

W&B Weave

1. How does Tracing with W&B Weave work
This loom video (~4mins) walks through how tracing works with W&B Weave
2. How can I add a custom cost for my GenAI model?
You can add a custom cost by using the add_cost method. This guide walks you through the steps of adding a custom cost. Additionally we also have this cookbook on Setting up a custom cost model with associated notebook.
3. How can I create my own custom Scorers with W&B Weave?
W&B Weave has it's own predefined scorers that you use as well as create your own Scorers. This documentation walks through creating your own scorers with W&B Weave
4. Can I control/customize the data that is logged?
Yes, If you want to change the data that is logged to weave without modifying the original function (e.g. to hide sensitive data), you can pass postprocess_inputs and postprocess_output to the op decorator.
Here's more details on how to do so
5. How to publish prompts to W&B Weave?
W&B Weave support Prompts as first class object. You can use weave.publish() to log prompts or any object as well (eg: Datasets, Models etc.) to Weave. This guide walks into details on publishing prompts to W&B Weave
6. How do I handle PII/sensitive data so it's not logged to W&B Weave?
W&B Weave has different ways to redact sensitive data, this guide walks through the details on different ways to Handle and Redact PII data.