Skip to main content

Using Segment Anything 2 with Weights & Biases

In this tutorial, we will cover how to use the Segment Anything 2 model using the Weights & Biases to log segmentation masks from automatic or prompted mask generation.
Created on September 13|Last edited on September 24
Figure: A Llama wearing a spiderman outfit while holding a coffee cup and a coffee mug. We'll be masking him in a second.
In this article, we will showcase how to use the Segment Anything 2 model to generate and log segmentation masks as Weights & Biases tables. In particular we will use a CPU compatible fork of the SAM 2 model available here: SauravMaheshkar/samv2. You can install the package using the following snippet:
!pip install samv2
Let's walk through some example use cases showcasing mask generation and logging.

Table of Contents




You can also play around with a web app here: lightly-ai/SAMv2-Mask-Generator

Automatic mask generation

You can see the code for automatic mask generation in the following notebook and a nice example below:

Open In Colab




Run set
1

The official implementation provides snippets on generating segmentation masks without providing any prompts , i.e. automatic mask generation. I've generated a third party library that contains a utility function encapsulating all use cases (automatic mask generation and variants of prompted mask generation) thereby making it easy to generate masks.
The following code snippet can be used to generate a output mask as shown above:
from sam2.automatic_mask_generator import SAM2AutomaticMaskGenerator
from sam2.build_sam import build_sam2
from sam2.utils.misc import variant_to_config_mapping
from sam2.utils.visualization import show_masks

model = build_sam2(
variant_to_config_mapping["tiny"],
"/content/sam2_hiera_tiny.pt",
)

mask_generator = SAM2AutomaticMaskGenerator(model)

masks = mask_generator.generate(image)

output_mask = show_masks(
image=image, masks=masks, scores=None, only_best=False, autogenerated_mask=True
)
Having generated the mask we can simply log both the image and the output segmentation mask by adding it to a Weights & Biases table, and logging the table.
import wandb
columns = ["image", "mask"]

wandb_table = wandb.Table(columns=columns)
wandb_table.add_data(wandb.Image(image), wandb.Image(output_mask))
run.log({"samv2_automatic_mask_generation": wandb_table})

Prompted Mask Generation

There are many ways to use the SAM 2 model in the prompted mode. Let's initialize a SAM 2 model for prompted segmentation.
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
from sam2.utils.misc import variant_to_config_mapping
from sam2.utils.visualization import show_masks

model = build_sam2(
variant_to_config_mapping["tiny"],
"/content/sam2_hiera_tiny.pt",
)
image_predictor = SAM2ImagePredictor(model)
image_predictor.set_image(image)
Now let's look at each possible usage in detail.
You can see the code for prompted mask generation in the following notebook.

Open In Colab



Perform Segmentation with a single point

We can provide a single point prompt to generate a mask around that point. This point represents a set of coordinates within the image dimensions and a label denoting whether the point is a foreground or background point.
import wandb

input_point = np.array([[300, 600]])
input_label = np.array([1])

masks, scores, logits = image_predictor.predict(
point_coords=input_point,
point_labels=input_label,
box=None,
multimask_output=True,
)
sorted_ind = np.argsort(scores)[::-1]

output_mask = show_masks(image, masks, scores)

columns = ["image", "mask", "score"]
wandb_table = wandb.Table(columns=columns)
wandb_table.add_data(
wandb.Image(image), wandb.Image(output_mask), scores[sorted_ind[0]]
)
run.log({"samv2_prompt_segmentation": wandb_table})

Run set
2

As we can see having provided an input point "on" the cup at the left, we've obtained a mask for the ceramic cup.

Perform Segmentation with Multiple Points

We can extend the same API to perform segmentation using multiple input points as the prompt. Instead of passing a single point, we provide a list of points and list of labels corresponding to each coordinate. This will lead to as many masks as there are points.
multi_point_coords = np.array([[300, 600], [700, 700]])
multi_point_labels = np.array([1, 1])

masks, scores, _ = image_predictor.predict(
point_coords=multi_point_coords,
point_labels=multi_point_labels,
box=None,
multimask_output=False,
)
sorted_ind = np.argsort(scores)[::-1]

output_mask = show_masks(image, masks, scores)

wandb_table.add_data(
wandb.Image(image), wandb.Image(output_mask), scores[sorted_ind[0]]
)
run.log({"samv2_prompt_segmentation": wandb_table})

Run set
2

As we can see, since we provided points on both cups the output contains segmentation masks for each cup.

Perform Segmentation using a single bounding box

We can also provide a bounding box as a prompt. Let's create a bounding box around the cup on the right and try and generate a segmentation mask. The API is relatively similar but instead of providing a 2-ary array we provide coordinates for each corner of the box.
single_box_coords = np.array([656, 655, 798, 816])

masks, scores, _ = image_predictor.predict(
point_coords=None,
point_labels=None,
box=single_box_coords,
multimask_output=False,
)
sorted_ind = np.argsort(scores)[::-1]

output_mask = show_masks(image, masks, scores=None, display_image=False)

wandb_table.add_data(
wandb.Image(image), wandb.Image(output_mask), scores[sorted_ind[0]]
)
run.log({"samv2_prompt_segmentation": wandb_table})

Run set
2

As we can see having provided a bounding box around the cup on the right, we get the corresponding segmentation mask.

Perform Segmentation using multiple bounding boxes

Similarly we can also pass in multiple bounding boxes as input. Let's try and create masks for both the cups like we did with multiple points.
multi_box_coords = np.array([[656, 655, 798, 816], [263, 518, 408, 653]])

masks, scores, _ = image_predictor.predict(
point_coords=None,
point_labels=None,
box=multi_box_coords,
multimask_output=False,
)
sorted_ind = np.argsort(scores)[::-1]

output_mask = show_masks(
image, masks, scores=None, only_best=False, display_image=False
)
wandb_table.add_data(
wandb.Image(image), wandb.Image(output_mask), scores[sorted_ind[0]]
)
run.log({"samv2_prompt_segmentation": wandb_table})

Run set
2


Perform Segmentation using a collection of boxes and points

We can also combine both boxes and points to generate masks using both inputs, as follows:
box = np.array([263, 518, 408, 653])
point = np.array([[300, 600]])
label = np.array([1])

masks, scores, _ = image_predictor.predict(
point_coords=point,
point_labels=label,
box=box,
multimask_output=False,
)
sorted_ind = np.argsort(scores)[::-1]

output_mask = show_masks(
image, masks, scores=None, only_best=False, display_image=False
)
wandb_table.add_data(
wandb.Image(image), wandb.Image(output_mask), scores[sorted_ind[0]]
)
run.log({"samv2_prompt_segmentation": wandb_table})

Run set
2

Having provided a point on the cup and a bounding box around it, we were able to generate a mask for the cup on the left.
You can see the code for prompted mask generation in the following notebook.

Open In Colab



Conclusion

In this article, you read through a brief overview of using the SAM 2 for automatic + prompted mask generation and how we can use Weights & Biases to log and store the various artifacts.
To see the full suite of W&B features, please check out this short 5-minute guide. If you want more reports covering the math and "from-scratch" code implementations, let us know in the comments down below or on our forum ✨!
Check out these other reports on Fully Connected covering LLM-related topics.

Iterate on AI agents and models faster. Try Weights & Biases today.