GSK Onboarding Guide (W&B Weave)
Created on May 8|Last edited on May 8
Comment
To obtain access to a specific team, please contact Roopa Mahishi (roopa.m.mahishi@gsk.com)
Please post any questions or feedback on this document to the shared Teams channel (link). Contact al@wandb.com if you need access.
💡
About W&B Weave 💫Quick Documentations LinksGetting Access to WeaveEnvironment VariablesW&B Installation & AuthenticationTrack and evaluate GenAI applications via W&B Weave Get started with W&B Weave - Basic TracingIntegrationsExample Integration UsageSpinning up an EvaluationTracking ObjectsPrompt TrackingStringPromptMessagesPromptParameterizing promptsTracking ModelsTracking DatasetsRetrieve Published Objects & OpsAdding Programmatic FeedbackProvide feedback via the UIProvide feedback via the SDKAdd Feedback to a CallAdd human annotationsCreate a human annotation scorer in the UICreate a human annotation scorer using the APIUse a human annotation scorer using the APIFAQsWhat information does Weave capture for a function?How can I disable code capture?How can I disable system information capture?How can I disable client information capture?Will Weave affect my function's execution speed?How do I render Markdown in the UI?
About W&B Weave 💫
Weave is an LLMOps platform by Weights & Biases (W&B) built to facilitate collaboration and reproducibility across the machine learning development lifecycle. Machine learning projects can quickly become a mess without some best practices in place to aid developers and scientists as they iterate on models and move them to production.
Weave is lightweight enough to work with whatever framework or platform teams are currently using, but enables teams to quickly start logging their important results to a central system of record. W&B also offers built-in visualization, automation, and documentation capabilities for better debugging, model tuning, and project management.
Quick Documentations Links
Getting Access to Weave
If you do not already have a W&B account, please ask an admin to send you an invite.
Our suggestion is to create a team that you can then log projects to. This will need to be done by an admin.
Environment Variables
It will make things easier to set the following environment variables. This will save you having to specify the host and key as detailed further down in this document.
os.environ["WANDB_BASE_URL"] = "https://gsk.wandb.io"os.environ["WANDB_API_KEY"] = "your-wandb-api-key"
W&B Installation & Authentication
To start using W&B Weave, you first need to install the Python package (if not already installed)
pip install weave
Once installed, authenticate your W&B Weave user account by logging in through the CLI or SDK. You should have received an email to sign up to the platform, after which you can obtain your API token (The API token is in your "Settings" section under your profile)
wandb login --host <YOUR W&B HOST URL> <YOUR API TOKEN>
OR through Python:
Once you are logged in, you are ready to track your workflows!
Track and evaluate GenAI applications via W&B Weave

Weave is a lightweight toolkit for tracking and evaluating GenAI applications
The goal is to bring rigor, best-practices, and composability to the inherently experimental process of developing GenAI applications, without introducing cognitive overhead.

Weave can be used to:
- Log and debug model inputs, outputs, and traces
- Build rigorous, apples-to-apples evaluations for language model use cases
- Capture valuable feedback that can be used to build new training and evaluation sets
- Organize all the information generated across the LLM workflow, from experimentation to evaluations to production
Get started with W&B Weave - Basic Tracing
Once you have authenticated with W&B, you can start by creating a Weave project with the following command
import weaveweave.init('<entity-name>/<project-name>')# this ensures the project is created in the relevant team.# If you have WANDB_ENTITY and WANDB_PROJECT set as env vars then you won't need to specify these.# You may need to create the team.
Now you can decorate the functions you want to track by adding this one line decorator weave.op() to your functions.
Here's what an example script would look like (feel free to copy/paste this in your IDE and run this script)
import weavefrom openai import OpenAIclient = OpenAI()# Weave will track the inputs, outputs and code of this function@weave.op()def extract_dinos(sentence: str) -> dict:response = client.chat.completions.create(model="gpt-4o",messages=[{"role": "system","content": """In JSON format extract a list of `dinosaurs`, with their `name`,their `common_name`, and whether its `diet` is a herbivore or carnivore"""},{"role": "user","content": sentence}],response_format={ "type": "json_object" })return response.choices[0].message.content# Initialize the weave projectweave.init('<team-name>/jurassic-park')sentence = """I watched as a Tyrannosaurus rex (T. rex) chased after a Triceratops (Trike), \both carnivore and herbivore locked in an ancient dance. Meanwhile, a gentle giant \Brachiosaurus (Brachi) calmly munched on treetops, blissfully unaware of the chaos below."""result = extract_dinos(sentence)print(result)
Integrations
Weave provides automatic logging integrations for popular LLM providers and orchestration frameworks. These integrations allow you to seamlessly trace calls made through various libraries, enhancing your ability to monitor and analyze your AI applications, even without explicitly using the weave.op() decorator. We integrate with the following LLM Providers & Frameworks:
- Amazon Bedrock
- Anthropic
- Cerebras
- Cohere
- Google
- LiteLLM
- MistralAI
- OpenAI
- OpenAI Agents SDK
- LangChain
- LlamaIndex
Example Integration Usage
Here, we're automatically tracking all calls to OpenAI.
from openai import OpenAIimport weaveweave.init(PROJECT)client = OpenAI()response = client.chat.completions.create(model="gpt-4o-mini",messages=[{"role": "system","content": "You are a grammar checker, correct the following user input.",},{"role": "user", "content": "That was so easy, it was a piece of pie!"},],temperature=0,)generation = response.choices[0].message.contentprint(generation)

Spinning up an Evaluation
Evaluation-driven development helps you reliably iterate on an application. The Evaluation class is designed to assess the performance of a Model on a given Dataset or set of examples using scoring functions.
See a preview of the API below:
import weavefrom weave import Evaluation# Define any custom scoring function@weave.op()def exact_match(expected: str, output: dict) -> dict:# Here is where you'd define the logic to score the model outputreturn {"match": expected == output}# Score your examples using scoring functionsevaluation = Evaluation(dataset=dataset, # can be a list of dictionaries or a weave.Dataset objectscorers=[exact_match], # can be a list of scoring functions)# Start tracking the evaluationweave.init(PROJECT)# Run the evaluationprint(asyncio.run(evaluation.evaluate(corrector)))# if you're in a Jupyter Notebook, run:# await evaluation.evaluate(corrector)
Follow the Build an Evaluation pipeline tutorial to learn more about Evaluation and begin iteratively improving your applications.

Tracking Objects
Organizing experimentation is difficult when there are many moving pieces. You can capture and organize the experimental details of your app like your system prompt or the model you're using within weave. Objects. This helps organize and compare different iterations of your app. In this section, we will cover the following, and you can view additional documentation here
Prompt Tracking
Creating, evaluating, and refining prompts is a core activity for AI engineers. Small changes to a prompt can have big impacts on your application's behavior. Weave lets you create prompts, save and retrieve them, and evolve them over time.
Weave is unopinionated about how a Prompt is constructed. If your needs are simple you can use our built-in weave.StringPrompt or weave. MessagesPrompt classes. If your needs are more complex you can subclass those or our base class weave.Prompt and override the format method.
When you publish one of these objects with weave.publish, it will appear in your Weave project on the "Prompts" page.
StringPrompt
import weaveweave.init('intro-example')system_prompt = weave.StringPrompt("You are a pirate")weave.publish(system_prompt, name="pirate_prompt")from openai import OpenAIclient = OpenAI()response = client.chat.completions.create(model="gpt-4o",messages=[{"role": "system","content": system_prompt.format()},{"role": "user","content": "Explain general relativity in one paragraph."}],)
Perhaps this prompt does not yield the desired effect, so we modify the prompt to be more clearly instructive.
import weaveweave.init('intro-example')system_prompt = weave.StringPrompt("Talk like a pirate. I need to know I'm listening to a pirate.")weave.publish(system_prompt, name="pirate_prompt")from openai import OpenAIclient = OpenAI()response = client.chat.completions.create(model="gpt-4o",messages=[{"role": "system","content": system_prompt.format()},{"role": "user","content": "Explain general relativity in one paragraph."}],)
When viewing this prompt object, I can see that it has two versions.

I can also select them for comparison to see exactly what changed.

MessagesPrompt
The MessagesPrompt can be used to replace an array of Message objects.
import weaveweave.init('intro-example')prompt = weave.MessagesPrompt([{"role": "system","content": "You are a stegosaurus, but don't be too obvious about it."},{"role": "user","content": "What's good to eat around here?"}])weave.publish(prompt, name="dino_prompt")from openai import OpenAIclient = OpenAI()response = client.chat.completions.create(model="gpt-4o",messages=prompt.format(),)
Parameterizing prompts
As the format method's name suggests, you can pass arguments to fill in template placeholders in the content string.
import weaveweave.init('intro-example')prompt = weave.StringPrompt("Solve the equation {equation}")weave.publish(prompt, name="calculator_prompt")from openai import OpenAIclient = OpenAI()response = client.chat.completions.create(model="gpt-4o",messages=[{"role": "user","content": prompt.format(equation="1 + 1 = ?")}],)
Tracking Models

Models are so common of an object type, that we have a special class to represent them: weave.Model. The only requirement is that we define a predict method.
from openai import OpenAIimport weaveweave.init(PROJECT)class OpenAIGrammarCorrector(weave.Model):# Properties are entirely user-definedopenai_model_name: strsystem_message: str@weave.op()def predict(self, user_input):client = OpenAI()response = client.chat.completions.create(model=self.openai_model_name,messages=[{"role": "system", "content": self.system_message},{"role": "user", "content": user_input},],temperature=0,)return response.choices[0].message.contentcorrector = OpenAIGrammarCorrector(openai_model_name="gpt-4o-mini",system_message="You are a grammar checker, correct the following user input.",)result = corrector.predict("That was so easy, it was a piece of pie!")print(result)
Tracking Datasets
Retrieve Published Objects & Ops
You can publish objects and then retrieve them in your code. You can even call functions from your retrieved objects!


import weaveweave.init(PROJECT)# Note: this url is available from the UI after publishing the object!ref_url = f"weave:///{ref.entity}/{PROJECT}/object/{ref.name}:{ref.digest}"fetched_collector = weave.ref(ref_url).get()# Notice: this object was loaded from remote location!result = fetched_collector.predict("That was so easy, it was a piece of pie!")print(result)
Adding Programmatic Feedback
Weave provides an integrated feedback system, allowing users to provide call feedback directly through the UI or programmatically via the SDK. Various feedback types are supported, including emoji reactions, textual comments, and structured data, enabling teams to:
- Build evaluation datasets for performance monitoring.
- Identify and resolve LLM content issues effectively.
- Gather examples for advanced tasks like fine-tuning.
You can find SDK usage examples for feedback in the UI under the Use tab in the call details page.
Provide feedback via the UI
You can add or remove a reaction, and add a note using the icons that are located in both the call table and individual call details pages.
- Call table: Located in Feedback column in the appropriate row in the call table.
- Call details page: Located in the upper right corner of each call details page.
To add a reaction:
- Click the emoji icon.
2. Add a thumbs up, thumbs down, or click the + icon for more emojis.

Provide feedback via the SDK
You can use the Weave SDK to programmatically add, remove, and query feedback on calls.
Query a project's feedback
You can query the feedback for your Weave project using the SDK. The SDK supports the following feedback query operations:
- client.get_feedback(): Returns all feedback in a project.
- client.get_feedback("<feedback_uuid>"): Return a specific feedback object specified by <feedback_uuid> as a collection.
- client.get_feedback(reaction="<reaction_type>"): Returns all feedback objects for a specific reaction type.
You can also get additional information for each feedback object in client.get_feedback():
id: The feedback object ID.
created_at: The creation time information for the feedback object.
feedback_type: The type of feedback (reaction, note, custom).
payload: The feedback payload
import weaveclient = weave.init('intro-example')# Get all feedback in a projectall_feedback = client.get_feedback()# Fetch a specific feedback object by id.# The API returns a collection, which is expected to contain at most one item.one_feedback = client.get_feedback("<feedback_uuid>")[0]# Find all feedback objects with a specific reaction. You can specify offset and limit.thumbs_up = client.get_feedback(reaction="👍", limit=10)# After retrieval, view the details of individual feedback objects.for f in client.get_feedback():print(f.id)print(f.created_at)print(f.feedback_type)print(f.payload)
Add Feedback to a Call
You can add feedback to a call using the call's UUID. To use the UUID to get a particular call, retrieve it during or after call execution. The SDK supports the following operations for adding feedback to a call:
call.feedback.add_reaction("<reaction_type>"): Add one of the supported <reaction_types> (emojis), such as 👍.
call.feedback.add_note("<note>"): Add a note.
call.feedback.add("<label>", <object>): Add a custom feedback <object> specified by <label>.
import weaveclient = weave.init('intro-example')call = client.get_call("<call_uuid>")# Adding an emoji reactioncall.feedback.add_reaction("👍")# Adding a notecall.feedback.add_note("this is a note")# Adding custom key/value pairs.# The first argument is a user-defined "type" string.# Feedback must be JSON serializable and less than 1 KB when serialized.call.feedback.add("correctness", { "value": 5 })
Add human annotations
Human annotations are supported in the Weave UI. To make human annotations, you must first create a Human Annotation scorer using either the UI or the API. Then, you can use the scorer in the UI to make annotations, and modify your annotation scorers using the API.
Create a human annotation scorer in the UI
To create a human annotation scorer in the UI, do the following:
- In the sidebar, navigate to Scorers.
2. In the upper right corner, click + Create scorer.
3. In the configuration page, set:
- Scorer type to Human annotation
- Name
- Description
- Type, which determines the type of feedback that will be collected, such as boolean or integer.
In the following example, a human annotator is asked to select which type of document the LLM ingested. As such, the Type selected for the score configuration is an enum containing the possible document types

Create a human annotation scorer using the API
Human annotation scorers can also be created through the API. Each scorer is its own object, which is created and updated independently. To create a human annotation scorer programmatically, do the following:
- Import the AnnotationSpec class from weave.flow.annotation_spec
2. Use the publish method from weave to create the scorer.
In the following example, two scorers are created. The first scorer, Temperature, is used to score the perceived temperature of the LLM call. The second scorer, Tone, is used to score the tone of the LLM response. Each scorer is created using save with an associated object ID (temperature-scorer and tone-scorer).
import weavefrom weave.flow.annotation_spec import AnnotationSpecclient = weave.init("feedback-example")spec1 = AnnotationSpec(name="Temperature",description="The perceived temperature of the llm call",field_schema={"type": "number","minimum": -1,"maximum": 1,})spec2 = AnnotationSpec(name="Tone",description="The tone of the llm response",field_schema={"type": "string","enum": ["Aggressive", "Neutral", "Polite", "N/A"],},)weave.publish(spec1, "temperature-scorer")weave.publish(spec2, "tone-scorer")
Use a human annotation scorer using the API
The feedback API allows you to use a human annotation scorer by specifying a specially constructed name and an annotation_ref field. You can obtain the annotation_spec_ref from the UI by selecting the appropriate tab, or during the creation of the AnnotationSpec.
import weaveclient = weave.init("feedback-example")call = client.get_call("<call_id>")annotation_spec = weave.ref("<annotation_spec_ref_uri>")call.feedback.add(feedback_type="wandb.annotation." + annotation_spec.name,payload={"value": 1},annotation_ref=annotation_spec.uri(),)
FAQs
What information does Weave capture for a function?
How can I disable code capture?
How can I disable system information capture?
How can I disable client information capture?
Will Weave affect my function's execution speed?
How do I render Markdown in the UI?
Wrap your string with weave.Markdown(...) before saving, and use weave.publish(...) to store it. Weave uses the object’s type to determine rendering, and weave.Markdown maps to a known UI renderer. The value will be shown as a formatted Markdown object in the UI.
Add a comment