Skip to main content

The Google GenAI SDK: A guide with a Python tutorial

Google’s GenAI SDK provides developers with a unified, flexible toolkit to seamlessly integrate advanced generative AI capabilities—including text, image, and video processing—into their applications using the latest Gemini models. This is a translated version of the article. Feel free to report any possible mis-translations in the comments section
Created on August 26|Last edited on August 26
Google’s GenAI SDK makes it easier for developers to bring generative AI features into their own apps. Whether you’re building for the web, mobile, or backend, the SDK gives direct access to Google’s Gemini models for tasks like text generation, summarization, image interpretation, and more. Instead of relying on external tools or APIs, you can now work with Google’s models directly inside your development environment.
In this article, we’ll cover how to set up the SDK, run inference with LLM’s, and also track performance of your models during development and production using W&B Weave.


Table of contents



Understanding the Gemini API and Google's GenAI SDK

The new Google GenAI SDK offers substantial benefits over its predecessors, most notably providing a unified interface that works seamlessly across both the Gemini Developer API and Vertex AI. This architectural improvement means that developers can prototype applications using the Developer API and subsequently migrate to Vertex AI without requiring extensive code rewrites. The SDK's design philosophy emphasizes developer productivity and deployment flexibility, addressing common pain points experienced with previous SDK iterations where switching between different Google AI services required significant code modifications.
I found a helpful table which illustrates some of the core differences between the Vertex AI Gemini API and the Gemini Developer API.In short, the Gemini AI is intended for short-term productivity and rapid prototyping, and Vertex AI is built for production-grade, enterprise-scale AI solutions.

Furthermore, the GenAI SDK supports multiple programming languages including Python, Go, Node.js, and Java, ensuring that development teams can work within their preferred technology stacks. This multi-language support, combined with consistent API patterns across platforms, significantly reduces the learning curve for developers transitioning between different components of Google's AI ecosystem. The SDK also incorporates modern development best practices, including comprehensive error handling, streaming capabilities, and robust authentication mechanisms that simplify integration into existing applications.
The current recommendation for developers is to migrate to the Google GenAI SDK, as it represents the future direction of Google's AI development tools. Theolder google-generativeai package, while still functional, is being superseded by the new unified SDK (google-genai) that offers enhanced features, better performance, and more comprehensive support for Google's expanding AI model ecosystem. This transition reflects Google's commitment to providing developers with cutting-edge tools that can adapt to the rapidly evolving landscape of generative AI technology.

Obtaining a Gemini API Key

To access the Gemini API, you need a Gemini API key, which serves as the primary authentication method for Google’s generative AI services. The key is required to authorize requests and enables access to capabilities from text generation to advanced multimodal processing. You can obtain this key through Google AI Studio by signing in with your Google account and creating API keys without initial cost.

Once you have the Gemini API key, configure it in your development environment before using the GenAI Python SDK. The SDK supports multiple configuration methods: passing the key directly, setting environment variables, or using configuration files.

Installing and using the Google Gen AI SDK

To install the Google GenAI SDK on your system, run the following command:
pip install google-genai
In this tutorial, we will also use Weave. Install it with the following command:
pip install weave

Configuring the Gemini Developer API

Now I'll show you how to create a client object, which serves as the primary interface for interacting with Gemini models. The genai.Client object handles authentication as well as communication with the Gemini API or Vertex AI backend, depending on your configuration. Here's a few ways you can create this object.

Option 1: Pass API key directly in code

from google import genai
client = genai.Client(api_key='YOUR_GEMINI_API_KEY')

Option 2: Using environment variables

You can set the necessary environment variable in your terminal or command prompt before running your application:
export GOOGLE_API_KEY='your-api-key'
Alternatively, set the environment variable programmatically at the start of your Python script:
import os
os.environ['GOOGLE_API_KEY'] = 'your-api-key'
from google import genai
client = genai.Client()

Configuring Gemini API via Vertex AI

If you would like to use Vertex AI as your backend, you will need to set up the gcloud CLI on your local system. Here’s how you can get started:

Step 1: Create a Google Cloud project

Begin by creating a new project in the Google Cloud console. Navigate to the project selector page and either select an existing project or create a new one. Ensure that billing is enabled for your project, as this is required for using Vertex AI services. If you haven't yet created a project, simply search 'create project' in the Google Cloud search bar and you can easily click the first result which will guide you to create a project.



Step 2: Enable the Vertex AI API

Next, enable the Vertex AI API for your project. In the Google Cloud console, enter “Vertex AI” in the search bar. Select Vertex AI from the results, which will bring you to the Vertex AI dashboard. Click on “Enable All Recommended APIs” to activate the necessary APIs for Vertex AI. This process may take a few moments to complete.



Step 3: Set up the Google Cloud CLI

To interact with Google Cloud services from your local development environment, you need to install the Google Cloud CLI. Download and install the CLI from the Google Cloud documentation. Once installed, initialize the CLI by running gcloud init in your terminal. This command will guide you through selecting your project and configuring your settings.
You can update the CLI components to ensure you have the latest tools and features by running:
gcloud components update
gcloud components install beta

Step 4: Configure IAM Roles

The administrator must ensure the appropriate IAM roles are assigned. These roles include:
  • Vertex AI UserorVertex AI Administrator, and
  • Service Account User
Depending on your specific needs and intended use of Vertex AI. I recommend Vertex AI Administrator and Service Account User permissions for this tutorial.
To accomplish this, simply search "IAM" in the Google Cloud Search bar, and you will be able to

You will then select the edit button next to your user account, which looks like the following:

And assign the appropriate roles:


Creating a Client using Vertex AI

If you access Gemini through Vertex AI, you need to set three environment variables:
export GOOGLE_GENAI_USE_VERTEXAI=true
export GOOGLE_CLOUD_PROJECT='your-project-id'
export GOOGLE_CLOUD_LOCATION='us-central1'
Or set them programmatically in Python before initializing the client:
import os
from google import genai

os.environ['GOOGLE_GENAI_USE_VERTEXAI'] = 'true'
os.environ['GOOGLE_CLOUD_PROJECT'] = 'your-project-id'
os.environ['GOOGLE_CLOUD_LOCATION'] = 'us-central1'

client = genai.Client()
Now that we have covered the basic setup for these two backends, we can move on to writing some code that will allow us to run inference with Gemini Models!

Making Your First Request

After setting up your environment and client, you can ask Gemini to generate responses, a process known as inference. This involves sending a prompt—and optionally an image—to the model, and receiving a generated reply. Gemini supports both text-only and multimodal interactions, allowing it to handle prompts that include images for tasks like detailed scene descriptions.
Here's some code that will allow you to run inference:
import requests
from io import BytesIO
from typing import Optional, Union
from PIL import Image
from google import genai
from google.genai import types

import weave; weave.init("google_genai")

@weave.op
def gemini_infer(
prompt: str,
api_key: str,
image: Image = None,
model: str = "gemini-2.0-flash-001"
) -> str:
"""
Run Gemini inference with an optional PIL image.

:param prompt: The user prompt/question/command
:param api_key: Your Gemini API key
:param image: An optional PIL Image object
:param model: Model string (default: "gemini-2.0-flash-001")
:return: Model's text response
"""
client = genai.Client(api_key=api_key) # <-- Replace with your key
# Assemble contents
if image is not None:
# Save image to buffer as JPEG
buf = BytesIO()
image.save(buf, format="JPEG")
image_bytes = buf.getvalue()
contents = [
types.Part.from_bytes(data=image_bytes, mime_type="image/jpeg"),
prompt,
]
else:
contents = [prompt]
# Run inference
response = client.models.generate_content(
model=model,
contents=contents,
)
return response.text

# ---- Example usage ----

if __name__ == "__main__":
API_KEY = " " # <-- put your key here
# Download example image to PIL
img_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/3/3f/Fronalpstock_big.jpg/800px-Fronalpstock_big.jpg"
headers = {"User-Agent": "Mozilla/5.0 (compatible; GeminiScript/1.0)"}
response = requests.get(img_url, headers=headers)
response.raise_for_status()
pil_img = Image.open(BytesIO(response.content))

# With image
r1 = gemini_infer(
prompt="Describe this alpine scenery in detail.",
api_key=API_KEY,
image=pil_img,
)
print("\nGemini vision response:")
print(r1)

# Text-only
r2 = gemini_infer(
prompt="Write a haiku about mountains.",
api_key=API_KEY
)
print("\nGemini text-only response:")
print(r2)

With your inference setup in place, you can easily experiment with different prompts and images. The model will return responses tailored to your request, showcasing Gemini’s flexibility in both creative and analytical tasks. This process forms the backbone of integrating Gemini-powered features into your projects.
After running the script, you can navigate to W&B Weave and view all of the inputs and outputs to our model! Here's a screenshot inside Weave:

Weave even shows me more information like costs for each LLM call and latency data on the call. If you are building LLM apps, I definitely think it's important to utilize some sort of "tracing" framework which will store the data about how your model is performing. This will allow your team to analyze data and make changes to your system based on actual usage patterns and model output quality, rather than just intuition. Adopting a robust tracing framework also makes it easier to debug misbehaving prompts or input data, track improvements as you iterate on your application, and provide transparency if you need to explain system behavior to users or stakeholders.

Generating Videos with Veo 2

Using the Gemini API, we can also access Veo 2 and generate videos! Here's a script allowing me to generate a video and also log it to Weave! Just use the moviepy format and your videos will be logged to Weave automatically!
import time
import weave
import moviepy.editor as mpy
from google import genai
from google.genai import types


API_KEY = "your_api_key" # <-- your Gemini key
weave.init("gemini-genai")


@weave.op
def gemini_generate_video(prompt: str, api_key: str, fname: str = "video0.mp4") -> mpy.VideoFileClip:
client = genai.Client(api_key=api_key)
op = client.models.generate_videos(
model="veo-2.0-generate-001",
prompt=prompt,
config=types.GenerateVideosConfig(
person_generation="dont_allow",
aspect_ratio="16:9",
),
)
while not op.done:
time.sleep(10)
op = client.operations.get(op)

vid = op.response.generated_videos[0]
client.files.download(file=vid.video)
vid.video.save(fname)
return mpy.VideoFileClip(fname)

# --- Example usage ---

clip = gemini_generate_video("rotating red cube in front of stars", API_KEY)

The process starts by connecting to the API with your key and sending a prompt to the generate_videos method on the Veo model. The API returns an asynchronous operation—you periodically poll this operation until the video is ready. Once finished, the generated video is downloaded and saved locally (for example, as an MP4 file). Integrating Weave allows you to log and track each video generation for observability and experiment management. If you use the moviepy format, your videos are automatically compatible with Weave, making it easy to analyze, share, or visualize results directly from your workflows.
Here's what it looks like inside Weave:

One great feature of Weave is that it also redacts API keys, so you don't need to worry about an accidental API key leak when using Weave.

Using Gemini for Video Understanding

Several Gemini Models also have the ability to understand video! Here's a script that allows you to run inference using Gemini, and get a summary of the previous video we generated.
from moviepy.editor import VideoFileClip
from google import genai
from google.genai import types
import weave

weave.init("gemini-video-analysis")

API_KEY = "your_api_key" # <- Replace with your key

client = genai.Client(api_key=API_KEY)

# Only for videos of size <20Mb
video_file_name = "./video0.mp4"
video_bytes = open(video_file_name, 'rb').read()

@weave.op
def gemini_summarize_video(video_bytes: bytes, prompt: str, api_key: str):
client = genai.Client(api_key=api_key)
response = client.models.generate_content(
model='models/gemini-2.0-flash',
contents=types.Content(
parts=[
types.Part(
inline_data=types.Blob(data=video_bytes, mime_type='video/mp4')
),
types.Part(text=prompt)
]
)
)
return response.text

# Logging and inference via Weave
summary = gemini_summarize_video(video_bytes, "Please summarize the video in 3 sentences.", API_KEY)
print(summary)

With just a few lines, you can feed a video right into Gemini and get back a smart summary in natural language. Simply provide your video and the prompt, and Gemini handles the heavy lifting—analyzing the visuals and distilling the content into clear, concise sentences.


Using "Thinking Mode" with the Google GenAI API

We can also utilize the thinking feature with Gemini 2.5 Flash and Gemini 2.5 Pro with the Google GenAI API. You can switch between using Vertex AI and the standard API key depending on your access, and you have the option to receive either streaming or standard (non-streaming) responses. Here's the code:
import os
from google import genai
from google.genai import types
import weave

# --- CONFIG: Toggle here ---
USE_VERTEX = True # Set to False for Gemini consumer API
STREAMING = True # Set to False for normal responses

PROJECT = "your-google-cloud-project"
LOCATION = "us-central1"
API_KEY = "your-api-key" # Only used if USE_VERTEX=False


weave.init("gemini-genai")

# --- SETUP ENV VARIABLES ---
if USE_VERTEX:
os.environ["GOOGLE_CLOUD_PROJECT"] = PROJECT
os.environ["GOOGLE_CLOUD_LOCATION"] = LOCATION
os.environ["GOOGLE_GENAI_USE_VERTEXAI"] = "True"
else:
os.environ["GOOGLE_GENAI_USE_VERTEXAI"] = "False"

# --- Initialize client ---
if USE_VERTEX:
client = genai.Client()
MODEL = "gemini-2.5-pro-preview-06-05"
else:
client = genai.Client(api_key=API_KEY)
MODEL = "gemini-2.5-pro-preview-06-05"


@weave.op
def run_gemini(
prompt,
streaming=STREAMING,
model=MODEL,
thinking_budget=1024,
include_thoughts=True
):
"""
Run a prompt through Gemini (Vertex or consumer API),
optionally using streaming, and configurable 'thinking_budget'
and 'include_thoughts'.

Returns (thoughts, answer) as strings.
"""
thoughts = ""
answer = ""
if streaming:
method = client.models.generate_content_stream
response = method(
model=model,
contents=prompt,
config=types.GenerateContentConfig(
thinking_config=types.ThinkingConfig(
include_thoughts=include_thoughts,
thinking_budget=thinking_budget
)
)
)
for chunk in response:
for part in chunk.candidates[0].content.parts:
if not part.text:
continue
if part.thought:
if not thoughts:
print("Thoughts summary:")
print(part.text)
thoughts += part.text
else:
if not answer:
print("Answer:")
print(part.text)
answer += part.text
else:
method = client.models.generate_content
response = method(
model=model,
contents=prompt,
config=types.GenerateContentConfig(
thinking_config=types.ThinkingConfig(
include_thoughts=include_thoughts,
thinking_budget=thinking_budget
)
)
)
for part in response.candidates[0].content.parts:
if not part.text:
continue
if part.thought:
if not thoughts:
print("Thoughts summary:")
print(part.text)
thoughts += part.text
else:
if not answer:
print("Answer:")
print(part.text)
answer += part.text
return thoughts, answer

# -------- EXAMPLES ---------
if __name__ == "__main__":
# Algebra example
prompt1 = "Solve x^2 + 4x + 4 = 0"
print("\nAlgebra Example:")
run_gemini(
prompt1,
streaming=True,
thinking_budget=512, # You can override per-call
include_thoughts=True # You can override per-call
)

# Primes example, with less thinking budget and thoughts off
prompt2 = "What is the sum of the first 50 prime numbers?"
print("\nPrimes Example:")
run_gemini(
prompt2,
streaming=False,
thinking_budget=32,
include_thoughts=False
)

# Logic puzzle example -- use default thinking_budget/streaming
prompt3 = """
Alice, Bob, and Carol each live in a different house on the same street: red, green, and blue.
The person who lives in the red house owns a cat.
Bob does not live in the green house.
Carol owns a dog.
The green house is to the left of the red house.
Alice does not own a cat.
Who lives in each house, and what pet do they own?
"""
print("\nLogic Puzzle Example:")
run_gemini(
prompt3,
thinking_budget=1024,
include_thoughts=True
)
Central to the script is the run_gemini function, which allows you to direct how the model reasons through each prompt. By adjusting the thinking_budget, you control how thoroughly Gemini explores the problem, and by toggling whether the model’s internal “thoughts” are included, you can choose to see a step-by-step explanation or just the final answer.
To use the script, simply provide a prompt—such as a math equation, a logic puzzle, or a general question. Customize the reasoning depth and thoughts display to match your needs. For example, you can see detailed intermediate steps as Gemini works through a complex problem, and with streaming mode, these thoughts and answers will appear in real time. This script is useful for anyone interested in exploring the model's reasoning process and understanding not only Gemini's answers, but also how it arrives at them.

Conclusion

The rapid evolution of large language models and generative AI is fundamentally changing how developers approach building applications, and Google’s GenAI SDK stands at the forefront of this transformation. By offering unified access to cutting-edge Gemini models for tasks ranging from text and image generation to video understanding, the SDK empowers teams to prototype rapidly and scale seamlessly into production. Its thoughtful design bridges the gap between experimentation and deployment, allowing developers to work with a consistent toolset regardless of whether they are using the Gemini Developer API or Vertex AI.
As the capabilities of these models continue to expand, embracing robust tools like the GenAI SDK will be essential for staying competitive and creating compelling, intelligent user experiences. Whether you’re optimizing workflows, building creative applications, or integrating advanced analysis into existing products, the Google GenAI SDK provides a solid foundation for harnessing the latest advances in AI—directly within your codebase and with the flexibility required for tomorrow’s challenges.