How to use Google's VertexAI PaLM 2 with Weights & Biases

A brief report covering how to leverage the powerful PaLM 2 model alongside W&B.
Created on August 28|Last edited on November 28
Comment
﻿
Introduction ﻿Google's PaLM 2 is a new large language model you can use right now. It's a very capable LLM, ready for you to use in your applications, with a Python SDK that you can use to interact with the model from within your Python code. 
And of course, you can use this model with W&B to track your different prompt experiments! You can even use track your complex agent pipelines and track and trace the chains and agents with Langchain and W&B.
In this report, we'll walk you through how to integrate Google's Palm model into your LLM pipelines. And, if you'd like to run the code yourself, simply click the handy colab below. 
﻿
﻿
﻿
First things firstBefore you can try the text prompts, you'll need to set up your GCP project and install the python SDK.
The code: 
# setup your GCP project and install the python SDK
# pip install google-cloud-aiplatform
project_id = "wandb-growth"
zone = "us-central1"
﻿
# initialize the Vertex AI API
vertexai.init(project=project_id, location=location)
﻿
model = TextGenerationModel.from_pretrained("text-bison@001")
    response = model.predict(
        "Give me ten interview questions for the role of program manager.",
        temperature=0.9,
    )
Using W&B Tables to track your experimentsWhen performing experiments and calling PaLM, you'll want to keep track of the different prompts and the results. You can use the wandb.log function to log the prompts and the results. You can also use the wandb.Table to log the prompts and the results. ﻿﻿
def palm_call(
    prompt: str,
    temperature: float = 0.7,
    max_output_tokens: int = 256,
    top_p: float = 0.8,
    top_k: int = 40,
    project_id: str = project_id,
    location: str = zone,
) -> str:
    vertexai.init(project=project_id, location=location)
    parameters = {
        "temperature": temperature,  # Temperature controls the degree of randomness in token selection.
        "max_output_tokens": max_output_tokens,  # Token limit determines the maximum amount of text output.
        "top_p": top_p,  # Tokens are selected from most probable to least until the sum of their probabilities equals the top_p value.
        "top_k": top_k,  # A top_k of 1 means the selected token is the most probable among all tokens.
    }
﻿
    model = TextGenerationModel.from_pretrained("text-bison@001")
    response = model.predict(
        prompt,
        **parameters,
    )
    return response.text
﻿
# let's define some configuration parameters
config = dict(
    temperature = 1.0,
    max_output_tokens = 128,
    top_p = 0.8,
    top_k = 40,
)
﻿
table = wandb.Table(columns=["model", "time", "temperature", "max_output_tokens", "top_p", "top_k", "prompt", "response"])
﻿
# we iterate through the queries and call the model
# adding the results to a table
for q in tqdm(queries):
    t0 = time.perf_counter()
    res = palm_call(q, **config)
    table.add_data(
        "text-bison@001", 
        time.perf_counter() - t0, 
        config["temperature"], 
        config["max_output_tokens"], 
        config["top_p"], 
        config["top_k"], 
        q, 
        res)
You can visualize your model generations along the parameters like temperature and top_k in a central place. Also, wandb.Tables support dataframe-like mechanics to you can filter, sort, and group by your data.
﻿
﻿
Track your experiments by tracing agent calls with W&B and Langchain 🦜🔗Langchain supports PaLM integration out of the box, so you can start tracking and debugging your LLM powered applications with W&B now!
from langchain.llms import VertexAI
﻿
os.environ["LANGCHAIN_WANDB_TRACING"] = "true"
﻿
# create a wandb run on our project
wandb.init(project="wandb-palm", job_type="generation")
﻿
# we define as the LLM to use
llm = VertexAI(model="text-bison@001", project="wandb-growth", location="us-central1")
﻿
# we define a chain with custom or out-of-the-shelf tools
tools = [Tool1(), Tool2()]
agent = initialize_agent(
    tools, 
    llm, 
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    handle_parsing_errors=True,
    verbose=True
)
﻿
# W&B will keep track of the traces for you!
agent.run(
    "Do something fancy with my tools!"
)
Using the trace view you can inspect the different tools used during the chain, you can also filter out failed calls and debug the inputs and outputs of each LLM call.
Click the numbers in the lefthand column below to view their traces
💡
﻿
﻿
ConclusionThis piece is a simple, quick tutorial on how to use W&B's tooling to get better insights from your PaLM modeling. If you'd like to read more about LLMs on W&B, we recommend the articles below. Also, if you're a hands-on learner, please check out our free courses. We've been building a ton of interactive, free LLM-specific courses we think you'd enjoy. You can find those by following this link. ﻿
Thanks for reading!
Prompt Engineering LLMs with LangChain and W&B
Join us for tips and tricks to improve your prompt engineering for LLMs. Then, stick around and find out how LangChain and W&B can make your life a whole lot easier.
Building Advanced Query Engine and Evaluation with LlamaIndex and W&B
This report showcases a few cool evaluation strategies and touches upon a few advanced features in LlamaIndex that can be used to build LLM-based QA bots. It also shows, the usefulness of W&B for building such a system.
How to Run LLMs Locally With llama.cpp and GGML
This article explores how to run LLMs locally on your computer using llama.cpp — a repository that enables you to run a model locally in no time with consumer hardware. 
Using LLMs to Extract Structured Data: OpenAI Function Calling in Action
Harnessing GPT-4 and ChatGPT for Efficient Processing of Unstructured Documents
﻿
﻿
Add a comment
Tags: Articles, Intermediate, NLP, LLM, Tutorial, Prompts
Iterate on AI agents and models faster. Try Weights & Biases today.