Using Segment Anything 2 with Weights & Biases

In this tutorial, we will cover how to use the Segment Anything 2 model using the Weights & Biases to log segmentation masks from automatic or prompted mask generation.
Saurav Maheshkar
Created on September 13|Last edited on September 24
Comment
﻿
Figure: A Llama wearing a spiderman outfit while holding a coffee cup and a coffee mug. We'll be masking him in a second.
In this article, we will showcase how to use the Segment Anything 2 model to generate and log segmentation masks as Weights & Biases tables. In particular we will use a CPU compatible fork of the SAM 2 model available here: SauravMaheshkar/samv2. You can install the package using the following snippet:
!pip install samv2
Let's walk through some example use cases showcasing mask generation and logging.﻿﻿
Table of ContentsAutomatic mask generationPrompted Mask GenerationPerform Segmentation with a single pointPerform Segmentation with Multiple PointsPerform Segmentation using a single bounding boxPerform Segmentation using multiple bounding boxesPerform Segmentation using a collection of boxes and pointsConclusion
﻿
﻿
You can also play around with a web app here: lightly-ai/SAMv2-Mask-Generator﻿
Automatic mask generationYou can see the code for automatic mask generation in the following notebook and a nice example below:
﻿
﻿
﻿
Run set1
﻿
The official implementation provides snippets on generating segmentation masks without providing any prompts , i.e. automatic mask generation. I've generated a third party library that contains a utility function encapsulating all use cases (automatic mask generation and variants of prompted mask generation) thereby making it easy to generate masks.
The following code snippet can be used to generate a output mask as shown above:
from sam2.automatic_mask_generator import SAM2AutomaticMaskGenerator
from sam2.build_sam import build_sam2
from sam2.utils.misc import variant_to_config_mapping
from sam2.utils.visualization import show_masks
﻿
model = build_sam2(
    variant_to_config_mapping["tiny"],
    "/content/sam2_hiera_tiny.pt",
)
﻿
mask_generator = SAM2AutomaticMaskGenerator(model)
﻿
masks = mask_generator.generate(image)
﻿
output_mask = show_masks(
    image=image, masks=masks, scores=None, only_best=False, autogenerated_mask=True
)
Having generated the mask we can simply log both the image and the output segmentation mask by adding it to a Weights & Biases table, and logging the table.
import wandb
columns = ["image", "mask"]
﻿
wandb_table = wandb.Table(columns=columns)
wandb_table.add_data(wandb.Image(image), wandb.Image(output_mask))
run.log({"samv2_automatic_mask_generation": wandb_table})
Prompted Mask GenerationThere are many ways to use the SAM 2 model in the prompted mode. Let's initialize a SAM 2 model for prompted segmentation.
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
from sam2.utils.misc import variant_to_config_mapping
from sam2.utils.visualization import show_masks
﻿
model = build_sam2(
    variant_to_config_mapping["tiny"],
    "/content/sam2_hiera_tiny.pt",
)
image_predictor = SAM2ImagePredictor(model)
image_predictor.set_image(image)
Now let's look at each possible usage in detail.
You can see the code for prompted mask generation in the following notebook.
﻿
Perform Segmentation with a single pointWe can provide a single point prompt to generate a mask around that point. This point represents a set of coordinates within the image dimensions and a label denoting whether the point is a foreground or background point.
import wandb
﻿
input_point = np.array([[300, 600]])
input_label = np.array([1])
﻿
masks, scores, logits = image_predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    box=None,
    multimask_output=True,
)
sorted_ind = np.argsort(scores)[::-1]
﻿
output_mask = show_masks(image, masks, scores)
﻿
columns = ["image", "mask", "score"]
wandb_table = wandb.Table(columns=columns)
wandb_table.add_data(
    wandb.Image(image), wandb.Image(output_mask), scores[sorted_ind[0]]
)
run.log({"samv2_prompt_segmentation": wandb_table})
﻿
Run set2
﻿
As we can see having provided an input point "on" the cup at the left, we've obtained a mask for the ceramic cup.
Perform Segmentation with Multiple PointsWe can extend the same API to perform segmentation using multiple input points as the prompt. Instead of passing a single point, we provide a list of points and list of labels corresponding to each coordinate. This will lead to as many masks as there are points.
multi_point_coords = np.array([[300, 600], [700, 700]])
multi_point_labels = np.array([1, 1])
﻿
masks, scores, _ = image_predictor.predict(
    point_coords=multi_point_coords,
    point_labels=multi_point_labels,
    box=None,
    multimask_output=False,
)
sorted_ind = np.argsort(scores)[::-1]
﻿
output_mask = show_masks(image, masks, scores)
﻿
wandb_table.add_data(
    wandb.Image(image), wandb.Image(output_mask), scores[sorted_ind[0]]
)
run.log({"samv2_prompt_segmentation": wandb_table})
﻿
Run set2
﻿
As we can see, since we provided points on both cups the output contains segmentation masks for each cup.
Perform Segmentation using a single bounding boxWe can also provide a bounding box as a prompt. Let's create a bounding box around the cup on the right and try and generate a segmentation mask. The API is relatively similar but instead of providing a 2-ary array we provide coordinates for each corner of the box.
single_box_coords = np.array([656, 655, 798, 816])
﻿
masks, scores, _ = image_predictor.predict(
    point_coords=None,
    point_labels=None,
    box=single_box_coords,
    multimask_output=False,
)
sorted_ind = np.argsort(scores)[::-1]
﻿
output_mask = show_masks(image, masks, scores=None, display_image=False)
﻿
wandb_table.add_data(
    wandb.Image(image), wandb.Image(output_mask), scores[sorted_ind[0]]
)
run.log({"samv2_prompt_segmentation": wandb_table})
﻿
Run set2
﻿
As we can see having provided a bounding box around the cup on the right, we get the corresponding segmentation mask.
Perform Segmentation using multiple bounding boxesSimilarly we can also pass in multiple bounding boxes as input. Let's try and create masks for both the cups like we did with multiple points.
multi_box_coords = np.array([[656, 655, 798, 816], [263, 518, 408, 653]])
﻿
masks, scores, _ = image_predictor.predict(
    point_coords=None,
    point_labels=None,
    box=multi_box_coords,
    multimask_output=False,
)
sorted_ind = np.argsort(scores)[::-1]
﻿
output_mask = show_masks(
    image, masks, scores=None, only_best=False, display_image=False
)
wandb_table.add_data(
    wandb.Image(image), wandb.Image(output_mask), scores[sorted_ind[0]]
)
run.log({"samv2_prompt_segmentation": wandb_table})
﻿
Run set2
﻿
Perform Segmentation using a collection of boxes and pointsWe can also combine both boxes and points to generate masks using both inputs, as follows:
box = np.array([263, 518, 408, 653])
point = np.array([[300, 600]])
label = np.array([1])
﻿
masks, scores, _ = image_predictor.predict(
    point_coords=point,
    point_labels=label,
    box=box,
    multimask_output=False,
)
sorted_ind = np.argsort(scores)[::-1]
﻿
output_mask = show_masks(
    image, masks, scores=None, only_best=False, display_image=False
)
wandb_table.add_data(
    wandb.Image(image), wandb.Image(output_mask), scores[sorted_ind[0]]
)
run.log({"samv2_prompt_segmentation": wandb_table})
﻿
Run set2
﻿
Having provided a point on the cup and a bounding box around it, we were able to generate a mask for the cup on the left.
You can see the code for prompted mask generation in the following notebook.
﻿
ConclusionIn this article, you read through a brief overview of using the SAM 2 for automatic + prompted mask generation and how we can use Weights & Biases to log and store the various artifacts.
To see the full suite of W&B features, please check out this short 5-minute guide. If you want more reports covering the math and "from-scratch" code implementations, let us know in the comments down below or on our forum ✨!﻿
Check out these other reports on Fully Connected covering LLM-related topics.
Automated PDF summarization of arXiv papers with Claude 3.5 Sonnet and W&B Weave 
Learn how to create an automated PDF summarization system for arXiv papers using Anthropic's API and W&B Weave using Chain Of Density. 
Refactoring Wandbot—our LLM-powered document assistant—for improved efficiency and speed
This report tells the story of how we utilized auto-evaluation-driven development to enhance both the quality and speed of Wandbot.
Claude 3.5 Sonnet on Vertex AI: Python quickstart
Here's how to get up and running with the newest model from Anthropic
﻿
﻿
Add a comment
Tags: Articles, Computer Vision, Image manipulation, Tutorial, Intermediate
Iterate on AI agents and models faster. Try Weights & Biases today.