Skip to main content

How to use Google's VertexAI PaLM 2 with Weights & Biases

A brief report covering how to leverage the powerful PaLM 2 model alongside W&B.
Created on August 28|Last edited on November 28

Introduction

Google's PaLM 2 is a new large language model you can use right now. It's a very capable LLM, ready for you to use in your applications, with a Python SDK that you can use to interact with the model from within your Python code.
And of course, you can use this model with W&B to track your different prompt experiments! You can even use track your complex agent pipelines and track and trace the chains and agents with Langchain and W&B.
In this report, we'll walk you through how to integrate Google's Palm model into your LLM pipelines. And, if you'd like to run the code yourself, simply click the handy colab below.




First things first

Before you can try the text prompts, you'll need to set up your GCP project and install the python SDK.
The code:
# setup your GCP project and install the python SDK
# pip install google-cloud-aiplatform
project_id = "wandb-growth"
zone = "us-central1"

# initialize the Vertex AI API
vertexai.init(project=project_id, location=location)

model = TextGenerationModel.from_pretrained("text-bison@001")
response = model.predict(
"Give me ten interview questions for the role of program manager.",
temperature=0.9,
)

Using W&B Tables to track your experiments

When performing experiments and calling PaLM, you'll want to keep track of the different prompts and the results. You can use the wandb.log function to log the prompts and the results. You can also use the wandb.Table to log the prompts and the results. 
def palm_call(
prompt: str,
temperature: float = 0.7,
max_output_tokens: int = 256,
top_p: float = 0.8,
top_k: int = 40,
project_id: str = project_id,
location: str = zone,
) -> str:
vertexai.init(project=project_id, location=location)
parameters = {
"temperature": temperature, # Temperature controls the degree of randomness in token selection.
"max_output_tokens": max_output_tokens, # Token limit determines the maximum amount of text output.
"top_p": top_p, # Tokens are selected from most probable to least until the sum of their probabilities equals the top_p value.
"top_k": top_k, # A top_k of 1 means the selected token is the most probable among all tokens.
}

model = TextGenerationModel.from_pretrained("text-bison@001")
response = model.predict(
prompt,
**parameters,
)
return response.text

# let's define some configuration parameters
config = dict(
temperature = 1.0,
max_output_tokens = 128,
top_p = 0.8,
top_k = 40,
)

table = wandb.Table(columns=["model", "time", "temperature", "max_output_tokens", "top_p", "top_k", "prompt", "response"])

# we iterate through the queries and call the model
# adding the results to a table
for q in tqdm(queries):
t0 = time.perf_counter()
res = palm_call(q, **config)
table.add_data(
"text-bison@001",
time.perf_counter() - t0,
config["temperature"],
config["max_output_tokens"],
config["top_p"],
config["top_k"],
q,
res)
You can visualize your model generations along the parameters like temperature and top_k in a central place. Also, wandb.Tables support dataframe-like mechanics to you can filter, sort, and group by your data.



Track your experiments by tracing agent calls with W&B and Langchain 🦜🔗

Langchain supports PaLM integration out of the box, so you can start tracking and debugging your LLM powered applications with W&B now!
from langchain.llms import VertexAI

os.environ["LANGCHAIN_WANDB_TRACING"] = "true"

# create a wandb run on our project
wandb.init(project="wandb-palm", job_type="generation")

# we define as the LLM to use
llm = VertexAI(model="text-bison@001", project="wandb-growth", location="us-central1")

# we define a chain with custom or out-of-the-shelf tools
tools = [Tool1(), Tool2()]
agent = initialize_agent(
tools,
llm,
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
handle_parsing_errors=True,
verbose=True
)

# W&B will keep track of the traces for you!
agent.run(
"Do something fancy with my tools!"
)
Using the trace view you can inspect the different tools used during the chain, you can also filter out failed calls and debug the inputs and outputs of each LLM call.
Click the numbers in the lefthand column below to view their traces
💡



Conclusion

This piece is a simple, quick tutorial on how to use W&B's tooling to get better insights from your PaLM modeling. If you'd like to read more about LLMs on W&B, we recommend the articles below. Also, if you're a hands-on learner, please check out our free courses. We've been building a ton of interactive, free LLM-specific courses we think you'd enjoy. You can find those by following this link. 
Thanks for reading!

Iterate on AI agents and models faster. Try Weights & Biases today.