Skip to main content

Getting Started with Amazon Bedrock and W&B Weave

Getting the most out of your LLMs by using Weave to trace and manage your API calls
Created on April 11|Last edited on April 26
We’re very proud to announce the public availability of W&B Weave, a suite of tools for developing and productionizing LLM-based applications. You can use Weave to:
  • Log and version LLM interactions and surrounding data, from development to production
  • Experiment with prompting techniques, model changes, and parameters
  • Evaluate your models and measure your progress
Go to https://wandb.me/weave to get started.
💡
In this piece, we're going to walk you through how to use Amazon Bedrock alongside our Weights & Biases Weave.
Let's get started.



Amazon Bedrock: Your hub for LLMs on AWS

Amazon Bedrock is a fully managed service offering high-performing foundation models from AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon via a single API.
It provides capabilities to build secure, private, and responsible generative AI applications. You can experiment with and customize top foundation models, build agents that integrate with your systems and data sources, and deploy AI capabilities serverlessly without managing infrastructure. Keep in mind, you'll need to request model access to use Bedrock LLMs.
Let's get into the code:

Calling LLMs within Bedrock

For the purposes of this walkthrough, you will need access to both Weights & Biases and Bedrock. The links we provided above will get you up and running.
First, you'll need to install Weave:
!pip install weave
Next, create your Weave project to version your LLM interactions:
import weave
weave.init('bedrock-weave')
Then, let's decorate the call function with the @weave.op() decorator:
import json
import boto3

@weave.op() # <- just add this 😎
def generate_text(
model_id: str,
prompt: str,
max_tokens: int=400,
temperature: float=0.7,
) -> dict:
# Check your region model access page, in my case on us-east-1 region:
bedrock = boto3.client(service_name='bedrock-runtime')

body = json.dumps({
"prompt": prompt,
"max_tokens": max_tokens,
"temperature": temperature,
})
response = bedrock.invoke_model(body=body, modelId=model_id)

response_body = json.loads(response.get('body').read())
return response_body.get('outputs')
We can now call the model and click on the link generated at the end. To use a foundation model with the Amazon Bedrock API, you'll need its model ID.
Here's an example of using Mistral 7B model for recipe generation.
model_id = 'mistral.mistral-7b-instruct-v0:2'

# this is the required prompt format for mistral instruct
prompt = """<s>[INST] Create a Vegan Carbonara Recipe[/INST] """

outputs = generate_text(model_id, prompt)

for index, output in enumerate(outputs):

print(f"Output {index + 1}\n----------")
print(f"Text:\n{output['text']}\n")
print(f"Stop reason: {output['stop_reason']}\n")

# Output 1
# ----------
# Text:
# Vegan Carbonara Recipe:

# Ingredients:
# - 12 oz (340g) spaghetti or other long pasta
# - 1/2 cup (120ml) unsweetened almond milk or cashew cream
# - 1/2 cup (100g) nutritional yeast
# - 1/3 cup (80g) cooked and mashed sweet potato or pumpkin
# - 1 tbsp olive oil
# - 1 cup (200g) steamed or roasted vegetables such as zucchini, broccoli, or asparagus
# - 1/2 cup (100g) chopped mushrooms
# - 1/2 cup (100g) chopped onion
# - 1 clove garlic, minced
# - 1/2 cup (100g) chopped walnuts or cashews, toasted
# - 1/2 cup (100g) vegetable bacon or pancetta, cooked and crumbled (optional)
# - Salt and freshly ground black pepper, to taste
# - Red pepper flakes, for serving (optional)
# - Fresh parsley or basil, for serving (optional)

# Instructions:

# 1. Cook the pasta according to package instructions in salted water until al dente. Drain and set aside, reserving 1 cup of the pasta water.

# 2. In a blender or food processor, combine the almond milk or cashew cream, nutritional yeast, mashed sweet potato or pumpkin, and olive oil. Blend until smooth and creamy. Taste and adjust seasoning with salt and pepper as needed.

# 3. In a large skillet, sauté the chopped onion, garlic, and

# Stop reason: length
👉 🍩 Open Weave Trace

What does Weave track?

At a high level:
  • Code: ensure all code surrounding generative AI API calls is versioned and stored
  • Data: where possible, version and store any datasets, knowledge stores, etc
  • Traces: permanently capture traces of functions surrounding generative AI calls
Weave makes this easy. Wrap any Python function with @weave.op() and Weave will capture and version the function’s code, log traces of all calls, including their inputs and outputs.
Here's what that looks like:

As you can see, the stop reason was length , so we may want to increase the max_tokens param to get the full answer. We can do this by changing the parameter and re-calling the function.
We then get a new link (with a new Trace) with the full output:


Keep track of nested calls

You can decorate any function in Python. For instance, you could create a run function and keep track of the calls on the prompt formatting and the LLM call separately:
@weave.op()
def format_prompt(prompt: str) -> str:
return f"""<s>[INST] {prompt}[/INST] """

@weave.op()
def run(prompt: str) -> None:
prompt = format_prompt(prompt)

outputs = generate_text(model_id, prompt, max_tokens=1000)
for index, output in enumerate(outputs):

print(f"Output {index + 1}\n----------")
print(f"Text:\n{output['text']}\n")
print(f"Stop reason: {output['stop_reason']}\n")
run("What clothes do I need to pack in winter for a trip to Chamonix?")


Trying different models

A very powerful feature about Bedrock is that you can quickly swap models by changing model IDs. For instance, we can grab the larger Mixtral-8x7B and re-run our experiment:
model_id = 'mistral.mixtral-8x7b-instruct-v0:1'

prompt = """<s>[INST] Create a Vegan Carbonara Recipe[/INST] """

outputs = generate_text(model_id, prompt, max_tokens=1000)

for index, output in enumerate(outputs):

print(f"Output {index + 1}\n----------")
print(f"Text:\n{output['text']}\n")
print(f"Stop reason: {output['stop_reason']}\n")
💡
Beware that switching models may involve formatting your prompt differently. In this case, both models share the same prompt structure, so no need to update.
💡

Anthropic’s Claude Models Available in Bedrock

One of the key features of Bedrock is the availability of Anthropic's powerful suite of Claude models. These models are available in different sizes and capacities, but our code won't change much to take advantage of them:
@weave.op() # <- just add this 😎
def anthropic_call(
model_id: str,
messages: str,
max_tokens: int=400,
system_prompt: str=None,
) -> dict:
bedrock = boto3.client(service_name='bedrock-runtime')


body = json.dumps({
"system": system_prompt,
"messages": messages,
"max_tokens": max_tokens,
"anthropic_version": "bedrock-2023-05-31"
})
response = bedrock.invoke_model(
body=body,
modelId=model_id
)

response_body = json.loads(response.get('body').read())

outputs = response_body.get('content')
return outputs

Model Access

Amazon Bedrock users need to request access to models to make them available for use. To request access to additional models, choose Model access in the navigation pane on the Amazon Bedrock console. For more information, see Model access.
💡
👀 look for the link in the bottom left corner of the Bedrock landing page

A more complex use case: Documentation Translation

Anthropic models are very powerful and are capable of solving complex tasks. In this example we will use Claude to translate the documentation website of W&B.
First, we need to format the prompt in slightly different manner, as this model expects a list of messages on the form of role and content. The system prompt is passed as an extra kwargs to the payload:
from pathlib import Path
from dataclasses import dataclass

system_prompt = """
# Instructions

You are a documentation translation assistant from English to {output_language}. We are translating valid docusaurus flavored markdown. Some rules to remember:

- Do not add extra blank lines.
- The results must be valid docusaurus markdown
- It is important to maintain the accuracy of the contents but we don't want the output to read like it's been translated. So instead of translating word by word, prioritize naturalness and ease of communication.
- In code blocks, just translate the comments and leave the code as is.

## Formatting Rules

Do not translate target markdown links. Never translate the part of the link inside (). For instance here [https://wandb.ai/site](https://wandb.ai/site) do not translate anything, but on this, you should translate the [] part:
[track metrics](./guides/track), [create logs](./guides/artifacts).
"""

human_prompt = """
Here is a chunk of documentation in docusaurus Markdown format to translate. Return the translation only, without adding anything else.
<Markdown start>
{md_chunk}
<End of Markdown>
"""

@dataclass
class PromptTemplate:
system_prompt: str
human_prompt: str
language: str
@weave.op() # <- just add this 😎
def format_claude(self, md_chunk):
"A formatting function for Mistral Instruct models"
system_prompt = self.system_prompt.format(output_language=self.language)
human_prompt = self.human_prompt.format(md_chunk=md_chunk)
messages = [{"role":"user", "content":human_prompt}]
return system_prompt, messages
We can then refactor the PromptTemplate and the model call inside a weave.Model object, this way our code is organized:
from weave import Model

class ClaudeDocTranslator(Model):
model_id: str='anthropic.claude-3-sonnet-20240229-v1:0'
max_tokens: int=2048
prompt_template: PromptTemplate
@weave.op()
def format_doc(self, path: Path) -> str:
"Read and format the document"
doc = Path(path).read_text()
system_prompt, messages = prompt_template.format_claude(doc)
return system_prompt, messages
@weave.op()
def translate(self, path: Path) -> dict:
system_prompt, messages = self.format_doc(path)
output = anthropic_call(
self.model_id,
messages=messages,
system_prompt=system_prompt,
max_tokens=self.max_tokens)
return output[0]['text']
Now let's translate one document:
model = ClaudeDocTranslator(prompt_template=prompt_template)
out = model.translate(Path("./docs/quickstart.md"))

Conclusions

We hope you enjoyed this brief tutorial on integrating W&B with Bedrock. Remember you'll need a few credentials to get started. You can:
Happy modeling!
Iterate on AI agents and models faster. Try Weights & Biases today.