Meta releases SAM 2
Metas new segmentation model!
Created on July 30|Last edited on July 30
Comment
Meta has announced the release of SAM 2, the next iteration of its Segment Anything Model (SAM), which now supports real-time object segmentation in both images and videos. This new model offers advanced performance and is shared openly under an Apache 2.0 license.
Unified Model for Real-Time Segmentation
SAM 2 is a unified model capable of handling object segmentation in real-time across images and videos. It achieves high accuracy and requires significantly less interaction time compared to previous models. One of its standout features is zero-shot generalization, which allows SAM 2 to segment objects it has not previously encountered, making it adaptable for various applications without the need for custom adaptations.
The release also includes the extensive SA-V dataset, which contains over 51,000 videos and 600,000 masklets. This dataset, along with the SAM 2 code, is available to the public, reflecting Meta’s commitment to open science.
Broad Real-World Applications
SAM 2 has broad real-world applications. It can be used to create new video effects, aid scientific research, and improve data annotation for better computer vision systems. For instance, it can enhance video editing by tracking objects or assist in scientific research by segmenting moving cells in microscope videos.
Technical Enhancements
Technically, SAM 2 introduces several enhancements. It supports promptable visual segmentation, allowing users to input prompts in any video frame to predict segmentation masks across all frames. It incorporates a memory mechanism that uses a memory bank to recall previous frames, ensuring accurate segmentation over time. The model is also highly efficient, making annotation tasks 8.4 times faster than with its predecessor.
Interactive Refinement and Occlusion Handling
SAM 2 includes interactive refinement features, enabling users to improve segmentation accuracy iteratively. It effectively handles occlusions with a dedicated occlusion head that manages objects temporarily hidden from view. The SA-V dataset, the largest of its kind, provides robust training and evaluation support for SAM 2.
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.