Build a reliable GenAI search system with Gemini Grounding and Vertex AI

Boost generative AI accuracy with Gemini Grounding in Vertex AI. Learn data integration, custom grounding, and W&B Weave logging in this tutorial
Brett Young, Christian Williams
Created on December 3|Last edited on December 3
Comment
Generating accurate and reliable model outputs is more important than ever, especially for generative AI applications that strongly rely on factual accuracy and information not be contained in the original training distribution. Vertex AI’s Gemini models, with their advanced grounding capabilities, enable developers to anchor responses to verified sources, enhancing the credibility and accuracy of generative AI outputs.
In this tutorial, we’ll explore the Gemini grounding process on Vertex AI, showing you how to integrate Google Search-based grounding, link personal data sources, and use W&B Weave for logging and visualization. With these tools, you can ensure that your models produce well-founded responses while gaining valuable insights into their performance.
Jump to the tutorial﻿
﻿
﻿
Table of contentsWhat is grounding in generative AI?Understanding Gemini GroundingLogging with W&B WeaveTutorial: Implementing Gemini Grounding with Vertex AI and W&B WeaveGrounding with personal data Conclusion 
﻿
What is grounding in generative AI?Grounding in generative AI is the process of anchoring model outputs to data sources like live feeds, trusted knowledge bases, or specific datasets. This ensures generated content is accurate, reliable, and contextually appropriate, reducing the risk of hallucinations or fabricated responses.
Why is grounding your LLM important?Without grounding, AI models rely solely on training data, which can result in "hallucinations"—plausible-sounding but incorrect answers. Grounding mitigates this by referencing authoritative sources, enhancing the reliability of AI outputs. This is especially critical in fields like healthcare, customer support, and legal advisories where factual consistency is paramount.
How does grounding work?Grounding often uses techniques like retrieval-augmented generation (RAG), which retrieves context from external sources such as databases or search engines to produce fact-based outputs. Web grounding expands on this by incorporating live web data (e.g., Google Search) and linking directly to proprietary internal documents, removing the need for complex vector databases.
Practical applications of groundingGrounding methods vary based on the application and data availability. Tools like Google’s Vertex AI enable AI models to integrate web search results or enterprise data repositories. These approaches allow AI systems to generate responses enriched with up-to-date, credible information, providing a higher standard of accuracy and trustworthiness for applications in diverse industries
Understanding Gemini GroundingGemini Grounding connects generative AI models to reliable external data sources, enhancing the accuracy and relevance of their responses. By incorporating tools such as Google Search and enabling the integration of custom datasets, this approach reduces errors and improves the trustworthiness of outputs, making it suitable for applications that demand factual precision and reliability.
The grounding process can utilize web data through Google Search or integrate internal datasets stored in Vertex AI and works with Gemini models like Gemini 1.5 Pro for detailed understanding, Gemini 1.5 Flash for low-latency responses, and also Gemini 1.0 Pro. These methods enable the model to dynamically retrieve relevant information or draw on curated organizational data, ensuring that responses are both accurate and aligned with specific needs. This approach offers a streamlined solution for delivering fact-based outputs across a range of applications.
Supported models and languagesGrounding in Vertex AI supports a range of models designed to integrate external data sources and enhance response accuracy. The Gemini 1.5 Pro model, optimized for complex tasks, is well-suited for Google Search grounding due to its ability to handle intricate search results and provide nuanced, accurate responses.
For scenarios requiring low latency, the Gemini 1.5 Flash model is designed to deliver quick responses for time-sensitive applications. However, it is not the primary choice for Google Search grounding, as it prioritizes speed over complexity. The foundational Gemini 1.0 Pro model is also available and supports grounding.
These models support grounding with Google Search across multiple languages, including English, Spanish, and Japanese. Leveraging the Gemini series' grounding capabilities allows organizations to deliver contextually accurate responses, enhancing user experiences by ensuring timely and relevant information across various languages and regions.
Methods of groundingGrounding can be achieved in two ways. The first method utilizes Google Search, enabling the model to retrieve and incorporate information on web. By grounding responses to search results, the model can generate nuanced, up-to-date, and accurate answers, citing the sources it references for further interpretability. The second grounding method involves integrating personal data through Vertex AI. By connecting to data sources within Vertex AI Search, the model can use a curated collection of user-specific or enterprise data, producing responses that are both relevant and precise for the intended application.	
In this tutorial, we will cover grounding implementation through the Vertex AI API, demonstrating how you can optimize your AI model’s performance by anchoring its outputs to these credible and timely data sources.
Logging with W&B WeaveIncorporating grounding into AI models enhances their accuracy by anchoring outputs to reliable data sources. However, to ensure the interpretability of grounding, it's crucial to monitor and log various aspects of the process. Tracking the URLs used for grounding allows you to verify the credibility of the sources and understand the context from which information is drawn. Additionally tracking the dynamic retrieval confidence threshold in Vertex AI's Gemini models helps you fine-tune the balance between model-generated knowledge and grounded information. By logging these parameters, you can assess the performance of grounding mechanisms, identify potential issues, and make informed adjustments to improve model reliability.
﻿W&B Weave offers a solution for tracking, analyzing, and visualizing your model’s grounding behavior over time. 
Weave is a logging tool in the Weights & Biases ecosystem that enables automatic tracking of inputs and outputs from key functions, including grounding-specific parameters like dynamic retrieval thresholds, URLs, and confidence scores. With Weave, you gain deeper visibility into how your model's grounding mechanisms perform under different conditions, which is invaluable for tuning the system for improved reliability.
Using W&B Weave is straightforward:
First, import Weave with import weave.
Next, initialize Weave by adding weave.init("your_project_name") to your setup.
Finally, to log the inputs and outputs of specific functions, simply apply the @weave.op decorator to those functions. 
Once Weave is set up, you can effortlessly capture each grounding request and response, gaining insight into the sources and confidence scores associated with every model output. This continuous logging mechanism provides you with a feedback loop to refine grounding parameters and improve model consistency, accuracy, and trustworthiness over time.
Tutorial: Implementing Gemini Grounding with Vertex AI and W&B WeaveIn this tutorial, we’ll walk through the process of implementing Gemini Grounding with Vertex AI to enhance the accuracy and reliability of AI model responses. You'll learn how to set up your Google Cloud project, configure Gemini models to integrate data from Google Search and custom datasets, and enable grounding for responses that are more factual and context-aware.
Additionally, we’ll cover how to log and visualize model performance using W&B Weave, allowing for detailed tracking and fine-tuning of your model’s grounding process.
Set up your Google Cloud project and enable necessary APIs.First, we'll cover the setup of Vertex AI, starting with creating a Google Cloud project, enabling the necessary APIs, and configuring the Google Cloud CLI. This foundation will ensure that you have the tools and permissions required to fully utilize Vertex AI's features. Then, we'll explore how to access and use your desired model via the Vertex AI platform. This includes setting up authentication, sending API requests, and handling the model's responses.
By the end of this tutorial, you'll be equipped with the knowledge to seamlessly integrate Vertex Grounding into your workflows, taking full advantage of its superior performance in various applications. Whether you're looking to enhance your coding processes, improve customer interactions, or gain deeper insights from data, this guide will help you harness the capabilities of one of the most advanced AI models available today.
Setting up Vertex AI on Google Cloud involves several key steps to ensure you have the necessary infrastructure and permissions in place. 
Here’s how you can get started:
Step 1: Create a Google Cloud projectBegin by creating a new project in the Google Cloud console. Navigate to the project selector page and either select an existing project or create a new one. Ensure that billing is enabled for your project, as this is required for using Vertex AI services. If you haven't yet created a project, simply search 'create project' in the Google Cloud search bar and you can easily click the first result which will guide you to create a project. 
﻿
﻿
Step 2: Enable the Vertex AI APINext, enable the Vertex AI API for your project. In the Google Cloud console, enter “Vertex AI” in the search bar. Select Vertex AI from the results, which will bring you to the Vertex AI dashboard. Click on “Enable All Recommended APIs” to activate the necessary APIs for Vertex AI. This process may take a few moments to complete.
﻿
﻿
Step 3: Set up the Google Cloud CLITo interact with Google Cloud services from your local development environment, you need to install the Google Cloud CLI. Download and install the CLI from the Google Cloud documentation. Once installed, initialize the CLI by running gcloud init in your terminal. This command will guide you through selecting your project and configuring your settings.
You can update the CLI components to ensure you have the latest tools and features by running:
gcloud components update
gcloud components install beta
Step 4: Configure IAM RolesThe administrator must ensure the appropriate IAM roles are assigned. These roles include:
Vertex AI User or Vertex AI Administrator, and
Service Account User
Depending on your specific needs and intended use of Vertex AI. I recommend Vertex AI Administrator and Service Account User permissions for this tutorial. 
To accomplish this, simply search "IAM" in the Google Cloud Search bar...
﻿
You will then select the edit button next to your user account, which looks like the following: 
﻿
...and assign the appropriate roles:  
﻿
﻿
Note that the Discovery Engine Admin permission is only required for using grounding with personal data. 
💡
Step 5: Enable the Gemini Models in VertexAIAs a final step, we now need to enable the Gemini models in Vertex AI. To do this, navigate to the Vertex AI console inside the Google Cloud console by searching "vertex ai" in the search bar, and selecting "Vertex AI" in the search results:  
﻿
Next, select "Model Garden" in the left hand panel, and select the Gemini Models you would like to enable:
﻿
After selecting the model, you can click and "enable" button which will enable your model! 
Install the required librariesNext, install the following packages: 
pip install vertexai==1.71.1 google-cloud-aiplatform==1.72.0 google-auth==2.21.0 google-auth-oauthlib==1.0.0 wandb weave
Running inference with Gemini Web Grounding Ok, now we are ready to get into the code which will enable you do run Gemini with web search grounding. Here's the code that will run inference with grounding and Gemini 1.5 Pro:
﻿
import vertexai
from vertexai.generative_models import (
    GenerationConfig,
    GenerativeModel,
    Tool,
)
from google.cloud.aiplatform_v1beta1.types import tool as tool_types
import json
import weave
﻿
weave.init("gemini_grounding")
﻿
# Initialize Vertex AI with your project details
PROJECT_ID = "dsports-6ab79"
LOCATION = "us-central1"
vertexai.init(project=PROJECT_ID, location=LOCATION)
﻿
# Define your threshold for dynamic retrieval
DYNAMIC_THRESHOLD = 0.68 # Adjust this value as needed
﻿
@weave.op
def perform_inference_with_grounding(prompt):
    # Initialize the Gemini model
    model = GenerativeModel("gemini-1.5-pro-002")
﻿
    # Set up dynamic retrieval config with a threshold for Google Search grounding
    dynamic_retrieval_config = tool_types.DynamicRetrievalConfig(
        mode=tool_types.DynamicRetrievalConfig.Mode.MODE_DYNAMIC,
        dynamic_threshold=DYNAMIC_THRESHOLD
    )
﻿
    # Configure GoogleSearchRetrieval with dynamic retrieval config
    google_search_retrieval = tool_types.GoogleSearchRetrieval(
        dynamic_retrieval_config=dynamic_retrieval_config
    )
﻿
    # Create the Tool using the google_search_retrieval
    raw_tool = tool_types.Tool(
        google_search_retrieval=google_search_retrieval
    )
﻿
    # Create the Tool object using _from_gapic method
    tool = Tool._from_gapic(raw_tool=raw_tool)
﻿
    # Generate content with grounding
    response = model.generate_content(
        prompt,
        tools=[tool],
        generation_config=GenerationConfig(
            temperature=0.0,
        ),
    )
﻿
    # Structure metadata and collect output text
    output_texts = []
    flat_metadata = []
    prediction_score = None  # Initialize prediction score variable
﻿
    # Loop over all candidates in the response
    if response and response.candidates:
        for candidate in response.candidates:
            output_text = "".join(part.text for part in candidate.content.parts)
            output_texts.append(output_text)
﻿
            # Extract grounding data
            grounding_chunks = candidate.grounding_metadata.grounding_chunks
            grounding_supports = candidate.grounding_metadata.grounding_supports
﻿
            # Retrieve prediction score if available
            # prediction_score = candidate.grounding_metadata.retrieval_metadata.get("googleSearchDynamicRetrievalScore")
            prediction_score = 0.0
            if candidate.grounding_metadata.retrieval_metadata:
                prediction_score = candidate.grounding_metadata.retrieval_metadata.google_search_dynamic_retrieval_score
﻿
            for support in grounding_supports:
                # For each segment, create a flat structure with source details
                for i, index in enumerate(support.grounding_chunk_indices):
                    source_info = grounding_chunks[index].web
                    flat_metadata.append({
                        "text_segment": support.segment.text,
                        "source_title": source_info.title,
                        "source_url": source_info.uri,
                        "confidence": support.confidence_scores[i]
                    })
﻿
    return output_texts, flat_metadata, prediction_score
﻿
# Example usage
prompt = "what is a fibonacci sequence?"
output_texts, flat_metadata, prediction_score = perform_inference_with_grounding(prompt)
﻿
# Print output texts, structured flat metadata, and prediction score as JSON
print("Output Texts:")
print(json.dumps(output_texts, indent=2))
﻿
print("\nFlat Metadata:")
print(json.dumps(flat_metadata, indent=2))
﻿
print("\nPrediction Score:")
print(prediction_score)
﻿
This code initializes a Vertex AI-powered generative model, configuring it to use Google Search for grounding responses based on a dynamic retrieval threshold. When generating a response to a prompt, the model references external information sources if the predicted retrieval score (determined by the threshold) indicates that grounding is beneficial.
The confidence scores, assigned to each grounded source in the response metadata, reflect the model's estimated accuracy and relevance of these sources. Higher confidence scores suggest that the model considers the information particularly relevant for the prompt, helping refine the credibility and reliability of the response.
Additionally, the prediction_score captures the model’s estimation of the necessity of grounding, offering insight into the effectiveness and accuracy of grounded responses. This setup enables evaluation and logging of output texts, sources, and their respective confidence scores using W&B Weave for further performance analysis.
Log grounding processes and model performance to W&B WeaveW&B Weave is used to log and track the grounding process, allowing you to monitor model inputs, outputs, and metadata throughout the workflow.
Initializing Weave with weave.init("gemini_grounding") at the beginning of the code ensures all operations are stored within the "gemini_grounding" project, making it easy to visualize and analyze grounding dynamics over time. By marking the perform_inference_with_grounding function with @weave.op, each invocation is automatically captured in Weave, which records the input prompt, generated response, prediction scores, and grounding metadata, including confidence scores for each source. This integration enables detailed tracking of how often grounding is triggered and how effectively it enhances response quality, which helps in adjusting thresholds and fine-tuning model performance. 
If we navigate to the Weave Traces dashboard, we can click on our recent trace, and view the model's output along with the sources that it retrieved: 
﻿
Using Weave with the Grounding API is a convenient way to understand how the system is retrieving sources and generating the final result. Weave also provides a nice UI that allows you to directly click the source URL's for further investigation! 
Grounding with personal data Now that we have successfully grounded our model queries with web search data, we can also create a system that utilizes internal data sources for grounding. This approach enhances the model’s ability to generate accurate responses by integrating custom data stored in Google Cloud. By incorporating proprietary information, the model can become more aligned with specific organizational needs, resulting in responses that are both contextually relevant and informed by curated data.
To start, we will need to enable a few other API's in Google Cloud. First, enable the Vertex AI Agent builder by searching for it in the Google Cloud console: 
Step 1: Enable Vertex AI Agent Builder 
﻿
Now, you can select the enable button: 
﻿
Step 2: Create a Data Store Next, you will see the main agent builder screen, which will allow you to select the "data sources" panel on the left-hand side: 
﻿
After clicking this, you will see the following screen, which will allow you to create the datastore: 
﻿
You will now see a screen that contains a grid of several possible data sources. For this tutorial, you can choose the "Cloud Storage" option. 
﻿
Next, you will need to select you Google Cloud Storage Bucket. In this project, we will create a new bucket, with the "Unstructured Documents" option. To create a new bucket, click the browse button at the bottom of the screen: 
﻿
Next, click the create folder icon shown below: 
﻿
After clicking button, you will be able to name your bucket. I will name my bucket 'gemini_grounding' and leave the region as 'global.' After entering all of the necessary information, you can click 'Create.' After creating the data source, you will see in the data sources list in your Agent Builder Console: 
﻿
If we click the name of the data source, we can see more information, like the data store ID, which we will need later on: 
﻿
Step 3: Upload data to our Storage Bucket  Now, he have created our storage bucket, and we are ready to upload some data to it.
For this tutorial, we will use a section of the W&B Weave docs from Github as our data source. The following code will download the data, convert all files to a .txt format, and upload them to our storage bucket. I'm not entirely sure converting every file into .txt format is necessarily a best-practice for a system like this, but for our purposes it works well with the "unstructured document" mode that we selected earlier for our storage format. Here's the code that downloads the Weave docs, and uploads them to our bucket.
Note, you will need to change the BUCKET_NAME and PROJECT_ID to your respective bucket name and Google Cloud Project ID. 
💡
import os
import subprocess
from google.cloud import storage
import time
# Set your Google Cloud project details and bucket name
PROJECT_ID = "dsports-6ab79"
BUCKET_NAME = "groundinggemini"
﻿
# Initialize the Cloud Storage client
storage_client = storage.Client(project=PROJECT_ID)
bucket = storage_client.bucket(BUCKET_NAME)
﻿
﻿
def download_weave_docs(repo_dir="weave_docs", retries=3):
    """Download Weave documentation using sparse checkout with shallow clone and retry logic."""
    success = False
    attempt = 0
﻿
    while not success and attempt < retries:
        attempt += 1
        print(f"Attempt {attempt} to download Weave documentation...")
        if not os.path.exists(repo_dir):
            os.makedirs(repo_dir)
        try:
            # Initialize the Git repository with sparse checkout
            subprocess.run(["git", "init"], cwd=repo_dir, check=True)
            subprocess.run(["git", "remote", "add", "origin", "https://github.com/wandb/weave.git"], cwd=repo_dir, check=True)
            subprocess.run(["git", "config", "core.sparseCheckout", "true"], cwd=repo_dir, check=True)
            subprocess.run(["git", "sparse-checkout", "set", "docs/docs/guides/tracking"], cwd=repo_dir, check=True)
            
            # Use a shallow clone by fetching only the latest commit
            subprocess.run(["git", "fetch", "--depth", "1", "origin", "master"], cwd=repo_dir, check=True)
            subprocess.run(["git", "checkout", "master"], cwd=repo_dir, check=True)
            
            success = True
            print("Weave documentation downloaded.")
        except subprocess.CalledProcessError:
            print("Download failed, retrying...")
            time.sleep(3)  # Wait a bit before retrying
            # Clean up any partial downloads to start fresh
            subprocess.run(["rm", "-rf", repo_dir])
    
    if not success:
        raise Exception("Failed to download Weave documentation after multiple attempts.")
﻿
﻿
def convert_to_text(filepath):
    """Attempts to read any file and save it as a .txt file."""
    txt_filepath = f"{filepath}.txt"
    
    try:
        # Try reading as plain text
        with open(filepath, 'r', encoding='utf-8') as file:
            text_content = file.read()
﻿
    except (UnicodeDecodeError, IOError):
        # Log and skip files that can't be read as text (e.g., binary files)
        print(f"Skipping unsupported or binary file: {filepath}")
        return None
    
    # Save the read content to a .txt file
    with open(txt_filepath, 'w', encoding='utf-8') as txt_file:
        txt_file.write(text_content)
    
    return txt_filepath
﻿
def upload_to_gcs(txt_filepath):
    """Uploads a .txt file to the specified Google Cloud Storage bucket."""
    blob_name = os.path.basename(txt_filepath)  # Use filename as blob name
    blob = bucket.blob(blob_name)
    blob.upload_from_filename(txt_filepath)
    print(f"Uploaded {txt_filepath} to gs://{BUCKET_NAME}/{blob_name}")
﻿
# Step 1: Download Weave docs
download_weave_docs()
﻿
# Step 2: Define the directory containing the downloaded files
documents_dir = "weave_docs/docs/docs/guides/tracking"  # Path to the downloaded Weave documentation
﻿
# Step 3: Process each document in the specified directory
for filename in os.listdir(documents_dir):
    filepath = os.path.join(documents_dir, filename)
    if os.path.isfile(filepath):
        txt_filepath = convert_to_text(filepath)  # Attempt to convert to .txt
        if txt_filepath:
            upload_to_gcs(txt_filepath)  # Upload the .txt file to Cloud Storage
            os.remove(txt_filepath)  # Optionally, delete the .txt file after uploading
﻿
This Python script automates the process of downloading and uploading documentation to Google Cloud Storage.
First, it initializes the Google Cloud Storage client with a specified project ID and bucket name. The script then uses sparse checkout and shallow cloning to download only the specific Weave documentation directory needed, retrying up to three times if necessary. Each converted file is uploaded to the designated Google Cloud Storage bucket, with the original .txt file removed after a successful upload. This setup creates a curated repository of documents in Google Cloud Storage that can later be referenced by a model for responding to queries.
To obtain the documentation needed for grounding, the code includes a function called download_weave_docs, which uses Git commands to clone only a specific section of the Weave GitHub documentation repository. By performing a “sparse checkout,” this function pulls just the relevant files without cloning the entire repository, making the process faster and more efficient. The function also has a retry mechanism to handle potential network or cloning issues, ensuring reliable access to the required documentation.
Once downloaded, the documentation files are converted to .txt format for compatibility with Vertex AI’s unstructured document processing requirements. The convert_to_text function reads each downloaded file and saves it in .txt format, handling errors gracefully by skipping unreadable or incompatible files, like binaries. This ensures that only usable text files are uploaded for grounding.
The final step is uploading the .txt files to Google Cloud Storage. The upload_to_gcs function takes each converted file and uploads it to the designated storage bucket, using the filename as the object name in the cloud. After each successful upload, the local `.txt` file can be deleted to manage storage efficiently. This entire setup prepares the Google Cloud Storage bucket as a structured data source that Vertex AI can access for grounded responses, enriching the model’s answers with contextually relevant, organization-specific information.
Step 4: Link our Storage bucket to Agent Builder Now, if we navigate to our data store that we created earlier, we can click the "import data" button to add our data that we have just uploaded. Simply click the 'Import Data' button  (shown in the previous screenshot) and select the "Cloud Storage" option: 
﻿
﻿
Next, simply select your storage bucket that you created earlier, select "Unstructured documents" and click "import": 
﻿
Now, Google Cloud will import your data, and you should see a success message: 
﻿
Step 5: Create a Agent Builder App Ok, now before we use the Gemini API to search over our data, we will need to create an Agent Builder App. Simply navigate to the Agent builder console and select the 'Apps' panel, and click "create a new app": 
﻿
Next, select the "Search" type, and enter a name and region for your app: 
﻿
﻿
I will use us-central1 for my region: 
﻿
﻿
You do not need to enable "enterprise mode" for Agent builder in order to use the Gemini API with your custom documents. The Gemini API allows you to use your own data for grounding through Vertex AI Search.
💡
With our datastore set up and populated with relevant documents, we’re now ready to code an integration with the Gemini API that enables our model to retrieve grounded information directly from the data we’ve uploaded. This approach leverages the Gemini model’s grounding capabilities to reference our curated internal data, allowing the model to generate responses that are contextually aligned with the information in our Google Cloud Storage.
Our code will initialize the Gemini model within Vertex AI, setting up the model to use Vertex AI Search for grounding. This grounding process enables the model to consult our datastore as it formulates responses, drawing on our internal data instead of, or in addition to, general web data. This setup is especially beneficial for tasks requiring responses based on proprietary information or specific organizational knowledge. With the configuration established, we’ll send a prompt to the model, and in return, we’ll receive responses enriched with grounded data from our Google Cloud Storage bucket.
Here's the code: 
import vertexai
from vertexai.preview.generative_models import (
    GenerationConfig,
    GenerativeModel,
    Tool,
    grounding,
)
import json
import weave
weave.init("gemini_vertex_search_grounding")
﻿
# Initialize Vertex AI with your project details
PROJECT_ID = "dsports-6ab79"
DATA_STORE_ID = "gemini-grounding_1731397909128"
LOCATION = "us-central1"
﻿
vertexai.init(project=PROJECT_ID, location=LOCATION)
﻿
@weave.op
def perform_vertex_search_inference(prompt):
    # Initialize the Gemini model
    model = GenerativeModel("gemini-1.5-pro-002")
    
    # Set up the Vertex AI Search grounding tool
    tool = Tool.from_retrieval(
        grounding.Retrieval(
            grounding.VertexAISearch(
                datastore=DATA_STORE_ID,
                project=PROJECT_ID,
                location="global",
            )
        )
    )
    
    # Generate content with grounding
    response = model.generate_content(
        prompt,
        tools=[tool],
        generation_config=GenerationConfig(
            temperature=0.0,
        ),
    )
    
    # Structure metadata and collect output text
    output_texts = []
    flat_metadata = []
    
    # Loop over all candidates in the response
    if response and response.candidates:
        for candidate in response.candidates:
            # Get content parts and join them
            content_text = candidate.content.text if hasattr(candidate.content, 'text') else ""
            output_texts.append(content_text)
            
            # Extract grounding metadata
            if hasattr(candidate, 'grounding_metadata'):
                grounding_chunks = candidate.grounding_metadata.grounding_chunks
                grounding_supports = candidate.grounding_metadata.grounding_supports
                
                for support in grounding_supports:
                    # For each segment, create a flat structure with source details
                    for i, chunk_idx in enumerate(support.grounding_chunk_indices):
                        chunk = grounding_chunks[chunk_idx]
                        # Check if chunk has retrieved_context
                        if hasattr(chunk, 'retrieved_context'):
                            source_info = chunk.retrieved_context
                            flat_metadata.append({
                                "text_segment": support.segment.text,
                                "source_uri": source_info.uri,
                                "source_title": source_info.title,
                                "confidence": support.confidence_scores[i] if hasattr(support, 'confidence_scores') else None
                            })
    
    return output_texts, flat_metadata
﻿
# Example usage
def run_vertex_search_query(prompt):
    output_texts, flat_metadata = perform_vertex_search_inference(prompt)
    
    # Print results in the same format as the example
    print("Model Response:")
    for text in output_texts:
        print(text)
    print("\nRetrieval Queries:", prompt)
    
    if flat_metadata:
        print("\nGrounding Metadata:")
        print(json.dumps(flat_metadata, indent=2))
    
    return output_texts, flat_metadata
﻿
# Example query
prompt = "What is wandb weave?"
results = run_vertex_search_query(prompt)
With the code snippet above, we have configured the Gemini model to search over our internal data store within Vertex AI. The DATA_STORE_ID in this setup references the specific Google Cloud data store we created and populated with relevant documents. This ID allows the Gemini model to access and retrieve information from our internal data source, grounding its responses in our uploaded content instead of relying solely on web-based data.
Within the code, we initialize Vertex AI with our project and location, then set up the Gemini model with Vertex AI’s Tool.from_retrieval method, which integrates Vertex AI Search as a grounding tool. This configuration ensures that the model will reference our custom data in Google Cloud Storage, using it to supplement or inform its responses to our queries. In this way, when we send a prompt, the model doesn’t just generate a general answer; it retrieves specific, relevant details from our data store, making the responses highly contextual and aligned with our stored information.
For each prompt, the perform_vertex_search_inference function processes the model’s response. It gathers the generated content, the grounding metadata, and other relevant details, including source URIs and confidence scores for the retrieved data segments. This structured information gives us insights into the specific documents and sections the model referenced, alongside the confidence level for each segment, ensuring transparency and traceability in grounded responses.
With Weave integrated into our setup, each invocation of the perform_vertex_search_inference function is automatically logged, providing a complete record of the grounding process. By using @weave.op on the function and initializing Weave with weave.init("gemini_vertex_search_grounding"), every input prompt, response, and associated metadata—such as grounding sources, confidence scores, and data segments—is captured within the Weave dashboard.
This logging setup allows us to visualize how the model interacts with our internal data store over time, making it easier to analyze the relevance of the information accessed and the consistency of grounded responses! 
After running the above script, we can see the following data logged in Weave:
﻿
﻿
﻿
Conclusion Grounding generative AI outputs through tools like Vertex AI's Gemini models represents a transformative step in ensuring accuracy, reliability, and contextual relevance in AI-driven interactions. By integrating web searches and internal data repositories, these systems move beyond the constraints of static training data, offering dynamic and verifiable responses.
This tutorial highlights the immense potential - and now ease - of combining robust AI models with innovative grounding methods to meet the demands of precision across industries. As developers harness these capabilities, they not only improve user trust but also lay the groundwork for more sophisticated and transparent AI applications, setting a new standard for generative AI excellence. I hope you enjoyed this tutorial! 
Building an LLM Python debugger agent with the new Claude 3.5 Sonnet  
Building a AI powered coding agent with Claude 3.5 Sonnet!
How to train and evaluate an LLM router
This tutorial explores LLM routers, inspired by the RouteLLM paper, covering training, evaluation, and practical use cases for managing LLMs effectively.
Training a KANFormer: KAN's Are All You Need? 
We will dive into a new experimental architecture, replacing the MLP layers in transformers with KAN layers! 
Building a real-time answer engine with Llama 3.1 405B and W&B Weave
Infusing llama 3.1 405B with internet search capabilities!! 
﻿
﻿
Add a comment
Tags: Articles, GenAI, LLM, Agents
Iterate on AI agents and models faster. Try Weights & Biases today.