Revolutionizing Medical Imaging with SAM and W&B
Explore how the Segment Anything Model (SAM) and Weights & Biases (W&B) are transforming medical imaging, with a focus on automated tumor detection and segmentation
Created on May 27|Last edited on May 27
Comment

Source: Llama 3
In the dynamic field of medical imaging, accurate segmentation and analysis of scans play a pivotal role in diagnosing and treating various conditions. Conventional methods often struggle with the intricate details and complexity of medical images, leading to potential errors and inefficiencies. However, recent advancements in artificial intelligence (AI) have paved the way for revolutionary techniques that promise to transform the landscape of medical imaging.
One such groundbreaking development is the Segment Anything Model (SAM), a cutting-edge AI model developed by Meta AI Research. This remarkable tool has the ability to segment virtually any object in an image with unprecedented accuracy, even if it hasn't been explicitly trained on that specific object category. By harnessing the power of SAM, medical professionals can unlock new frontiers in image analysis, enabling faster and more precise diagnoses, as well as more effective treatment planning.
Medical professionals can unlock new frontiers in image analysis, enabling faster and more precise diagnoses, as well as more effective treatment planning by harnessing the power of SAM. From detecting and delineating tumors in MRI scans to identifying anomalies in X-rays or CT scans, SAM's versatility promises to revolutionize the way medical imaging is approached, ultimately leading to better patient outcomes.
However, to fully realize the potential of this groundbreaking technology, it is crucial to integrate it with robust tools for experiment tracking, model versioning, and performance monitoring. This is where Weights & Biases (W&B) comes into play, offering a comprehensive platform for managing and optimizing AI models throughout their lifecycle. W&B enables researchers and practitioners to track and visualize model performance, compare different model versions, and collaborate effectively, ensuring reproducibility and facilitating continuous improvement.
This article will delve into the functionalities of SAM and W&B, explore their applications in medical imaging, and provide a practical implementation guide for using these tools to segment tumors in MRI scans.
What is SAM?

The Segment Anything Model (SAM) is a groundbreaking artificial intelligence model developed by Meta AI Research. It is a state-of-the-art visual segmentation model that can accurately segment objects in an image with remarkable precision, even if it has not been explicitly trained on those specific object categories.
SAM is a transformer-based model that uses the power of prompt engineering and zero-shot transfer learning. Unlike traditional segmentation models that require extensive training on specific object categories, SAM can adapt to diverse segmentation tasks by following natural language prompts provided by users.
The model's unique architecture combines a vision transformer encoder with a mask decoder, allowing it to understand the visual context of an image and generate accurate segmentation masks for the objects of interest. This is achieved through a two-stage process: first, the model identifies the regions of interest based on the provided prompts, and then it generates high-quality segmentation masks for those regions.
Key Features and Functionality
SAM's key features and functionality include:
- Zero-shot Transfer Learning: SAM can segment objects it has never encountered before, thanks to its ability to understand natural language prompts and transfer knowledge from related domains.
- Prompt-based Segmentation: Users can specify the objects they want to segment using simple text prompts, allowing for a highly flexible and intuitive interaction with the model.
- Multi-Object Segmentation: SAM can segment multiple objects within the same image, making it suitable for complex scenarios where multiple regions of interest need to be identified.
- High Accuracy and Robustness: The model achieves state-of-the-art performance in segmentation tasks, with high accuracy and robustness to variations in object appearance, occlusions, and challenging backgrounds.
Advantages in Medical Imaging
The application of SAM in the field of medical imaging offers several advantages:
Versatility: SAM can segment a wide range of anatomical structures, lesions, and abnormalities without the need for extensive training on specific medical imaging modalities or pathologies.
Time and Cost Savings: By automating the segmentation process, SAM can significantly reduce the time and effort required for manual annotation, leading to increased efficiency and cost savings in medical imaging workflows.
Improved Accuracy and Consistency: SAM's high accuracy and robustness can lead to more reliable and consistent segmentation results, reducing the potential for human error and enabling more accurate diagnoses and treatment planning.
Adaptability: As new imaging modalities or pathologies emerge, SAM can quickly adapt to these changes without requiring extensive retraining, making it a future-proof solution for the ever-evolving field of medical imaging.
Medical professionals can streamline their image analysis processes, enabling faster and more accurate diagnoses, as well as more effective treatment planning and monitoring. The model's versatility and adaptability make it a powerful tool for addressing the diverse challenges encountered in medical imaging by leveraging SAM's capabilities.
SAM Architecture

While the intricate details of SAM's architecture are complex, it's essential to understand the fundamental principles that underlie its remarkable performance. At a high level, SAM employs a transformer-based encoder-decoder architecture, which allows it to effectively capture and integrate visual and linguistic information.
The model's architecture can be broken down into two main components: the Vision Transformer (ViT) encoder and the mask decoder.
1. Vision Transformer (ViT) Encoder:

- The ViT encoder is responsible for extracting visual features from the input image.
- It consists of a series of transformer blocks that process the image in a sequence-to-sequence manner, capturing long-range dependencies and contextual information.
- The encoder's output is a set of rich visual representations that encode the spatial and semantic information present in the image.
2. Mask Decoder:

- The mask decoder takes the visual representations from the ViT encoder and the user-provided text prompt as input.
- It employs a transformer-based architecture to fuse the visual and linguistic information, allowing for seamless integration of the image context and the desired segmentation task.
- The decoder generates a set of dense pixel-wise segmentation masks, each corresponding to a specific object or region of interest specified in the prompt.
One of the key innovations in SAM's architecture is the use of prompt engineering. By incorporating natural language prompts, the model can effectively leverage its zero-shot transfer learning capabilities, allowing it to segment objects it has never encountered before.
The prompts are encoded into a linguistic representation using a language model, which is then integrated with the visual representations from the ViT encoder. This fusion of visual and linguistic information enables SAM to understand the context and semantics of the desired segmentation task, resulting in highly accurate and robust segmentation masks.
SAM's architecture is designed to be scalable and efficient, making it suitable for deployment in various medical imaging applications. The model's ability to leverage transfer learning and adapt to new tasks without extensive retraining makes it a powerful tool for addressing the diverse challenges encountered in the field of medical image analysis.
Importance of Segmentation in Medical Imaging

Source: Llama 3
Accurate segmentation of medical images plays a pivotal role in the diagnosis and treatment of various conditions. By precisely delineating anatomical structures, lesions, and abnormalities, segmentation enables healthcare professionals to:
- Detect and Characterize Pathologies: Segmentation techniques can assist in identifying and characterizing tumors, lesions, and other abnormalities, providing valuable information for diagnosis and treatment planning.
- Quantify and Monitor Disease Progression: By segmenting and quantifying changes in anatomical structures or lesions over time, healthcare professionals can monitor disease progression and evaluate the effectiveness of treatments.
- Assist in Treatment Planning: Precise segmentation of target structures and surrounding tissues is crucial for radiation therapy planning, surgical planning, and other interventional procedures, ensuring accurate and effective treatment delivery.
- Facilitate Research and Clinical Trials: Segmentation of medical images is essential for conducting research studies, clinical trials, and developing new diagnostic and therapeutic approaches, enabling the analysis of large datasets and the identification of patterns and biomarkers.
Segmentation plays a vital role in various medical applications across different imaging modalities, including:
- Brain Imaging (MRI): Segmentation of brain structures, tumors, and lesions is crucial for diagnosing and monitoring neurological conditions, such as stroke, traumatic brain injury, and neurodegenerative diseases.
- Cardiovascular Imaging (CT, MRI): Segmentation of the heart, blood vessels, and associated structures is essential for diagnosing and treating cardiovascular diseases, including coronary artery disease, heart failure, and congenital heart defects.
- Oncology Imaging (CT, MRI, PET): Accurate segmentation of tumors and metastases is critical for cancer staging, treatment planning, and response monitoring, enabling personalized and targeted therapies.
- Musculoskeletal Imaging (X-ray, CT, MRI): Segmentation of bones, joints, and soft tissues is important for diagnosing and treating musculoskeletal disorders, such as fractures, arthritis, and sports-related injuries.
As medical imaging technologies continue to evolve and new modalities emerge, the importance of accurate and efficient segmentation techniques will only increase. The integration of advanced AI models like SAM into medical imaging workflows has the potential to revolutionize the field, enabling more precise diagnoses, personalized treatment plans, and ultimately, improved patient outcomes.
SAM in Medical Imaging

Source: Llama 3
Application Across Different Modalities (MRI, CT, X-ray)
The versatility of the Segment Anything Model (SAM) makes it a valuable tool for medical image analysis across various imaging modalities, including magnetic resonance imaging (MRI), computed tomography (CT), and X-ray imaging. Each of these modalities plays a crucial role in different aspects of medical diagnosis and treatment, and SAM's ability to segment objects and structures accurately can significantly enhance the utility of these imaging techniques.
MRI Segmentation: MRI is widely used for visualizing soft tissues, such as the brain, muscles, and internal organs. SAM can be employed for segmenting brain structures, tumors, and lesions, aiding in the diagnosis and monitoring of neurological conditions like stroke, traumatic brain injury, and neurodegenerative diseases. Additionally, SAM can segment organs like the heart, liver, and kidneys, enabling accurate assessment of their structure and function.
CT Segmentation: CT imaging excels at visualizing bony structures and detecting abnormalities in various organs. SAM can be applied to segment bones, joints, and soft tissues, facilitating the diagnosis and treatment of musculoskeletal disorders, such as fractures, arthritis, and sports-related injuries. Furthermore, SAM can segment tumors, lymph nodes, and blood vessels, contributing to cancer staging, treatment planning, and cardiovascular disease management.
X-ray Segmentation: X-ray imaging is widely used for diagnosing bone fractures, lung conditions, and other abnormalities. SAM can segment bones, lung structures, and potential lesions or masses, assisting in the interpretation of X-ray images and enabling more accurate diagnoses.
What Can We Use SAM for in the Medical Imaging Field?
The applications of SAM in the medical imaging field are diverse and far-reaching. Some potential use cases include:
The applications of SAM in the medical imaging field are diverse and far-reaching. Some potential use cases include:
- Tumor and Lesion Segmentation: SAM can accurately segment tumors, lesions, and other abnormalities across various imaging modalities, aiding in diagnosis, treatment planning, and monitoring disease progression.
- Organ Segmentation: By precisely delineating organs and anatomical structures, SAM can facilitate the assessment of organ function, detect abnormalities, and assist in surgical planning.
- Radiation Therapy Planning: Accurate segmentation of target structures and surrounding tissues is crucial for radiation therapy planning, ensuring precise and effective treatment delivery.
- Clinical Research and Trials: SAM can be employed in clinical research and trials, enabling the analysis of large datasets and the identification of patterns and biomarkers, ultimately contributing to the development of new diagnostic and therapeutic approaches.
- Computer-Aided Diagnosis (CAD): SAM can be integrated into computer-aided diagnosis (CAD) systems, providing valuable supplementary information to radiologists and enhancing diagnostic accuracy.
Healthcare professionals can streamline image analysis processes, reduce the time and effort required for manual annotation, and increase the accuracy and consistency of segmentation results using SAM's capabilities in medical imaging. This, in turn, can lead to more reliable diagnoses, personalized treatment plans, and improved patient outcomes.
Practical Tutorial For Tumor Detection in MRI Scans
To illustrate the practical application of SAM in medical imaging, let's explore a case study focused on tumor detection in magnetic resonance imaging (MRI) scans of the brain using an MRI brain tumor dataset, a publicly available collection of brain tumor images from roboflow. The dataset consists of over 3,000 images, providing a solid starting point for training our model.
Brain tumors can have severe consequences if left undetected or untreated, making early and accurate diagnosis crucial. MRI scans are widely used for visualizing brain structures and detecting abnormalities, such as tumors. However, manual segmentation of tumors from MRI scans can be time-consuming, subjective, and prone to errors, particularly when dealing with complex or irregularly shaped lesions.
Step 1: Getting Access to Colab GPU (Using Google Colab)
Let's make sure that we have access to GPU. We can use nvidia-smi command to do that. In case of any problems navigate to Edit -> Notebook settings -> Hardware accelerator if you are using Google Colab, set it to GPU, and then click Save.
!nvidia-smi
Step 2: Create a Home Constant
import osHOME = os.getcwd()print("HOME:", HOME)
Step 3: Install Segment Anything Model and Other Dependencies
We install the SAM model and other dependencies like roboflow, jupyter_bbox_widget, dataclasses-json and supervision.
!pip install -q 'git+https://github.com/facebookresearch/segment-anything.git'!pip install -q jupyter_bbox_widget roboflow dataclasses-json supervision
Step 4: Download the Weights
Let’s create a directory called 'weights' in the current working directory using the system command 'mkdir'. We will then download a pre-trained checkpoint file for the Segment Anything Model (SAM) from a public URL and save it in the 'weights' directory using the 'wget' command. Next, we import the PyTorch library, which is a popular deep learning framework. We then determine whether a GPU is available on the system for running computations using PyTorch. If a GPU is available, it sets the device to the first available GPU ('cuda:0'); otherwise, it sets the device to the CPU. Finally, it specifies the type of SAM model to be used, which in this case is "vit_h" (Vision Transformer - Huge), one of the available model configurations for SAM.
import os!mkdir -p {HOME}/weights!wget -q https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth -P {HOME}/weightsCHECKPOINT_PATH = os.path.join(HOME, "weights", "sam_vit_h_4b8939.pth")print(CHECKPOINT_PATH, "; exist:", os.path.isfile(CHECKPOINT_PATH))
Step 5: Import SAM Model
from segment_anything import sam_model_registry, SamAutomaticMaskGenerator, SamPredictorsam = sam_model_registry[MODEL_TYPE](checkpoint=CHECKPOINT_PATH).to(device=DEVICE)
Step 6: Download the Dataset from Roboflow
We set the working directory to the HOME directory to ensure that the dataset is downloaded from Roboflow and extracted to the HOME directory.
%cd {HOME}import roboflowfrom roboflow import Roboflowroboflow.login()rf = Roboflow()project = rf.workspace("hashira-fhxpj").project("mri-brain-tumor")dataset = project.version(1).download("coco")
Step 7: Write Helpers for COCOCategory
We define a dataclass COCOCategory using the @dataclass and @dataclass_json decorators.
import numpy as npfrom dataclasses import dataclassfrom typing import List, Tuple, Union, Optionalfrom dataclasses_json import dataclass_jsonfrom supervision import Detections@dataclass_json@dataclassclass COCOCategory:id: intname: strsupercategory: str
Step 8: Write Helpers for COCOImage
We define a dataclass COCOImage using the @dataclass and @dataclass_json decorators.
@dataclass_json@dataclassclass COCOImage:id: intwidth: intheight: intfile_name: strlicense: intdate_captured: strcoco_url: Optional[str] = Noneflickr_url: Optional[str] = None
Step 8: Write Helpers for COCOAnnotation
@dataclass_json@dataclassclass COCOAnnotation:id: intimage_id: intcategory_id: intsegmentation: List[List[float]]area: floatbbox: Tuple[float, float, float, float]iscrowd: int
Step 9: Write Helpers for COCOLicense
@dataclass_json@dataclassclass COCOLicense:id: intname: strurl: str
Step 10: Write Helpers for COCOJson and Load the CocoJson
We define a dataclass COCOJson and a function load_coco_json that takes a JSON file path as input and returns a COCOJson instance.
@dataclass_json@dataclassclass COCOJson:images: List[COCOImage]annotations: List[COCOAnnotation]categories: List[COCOCategory]licenses: List[COCOLicense]def load_coco_json(json_file: str) -> COCOJson:import jsonwith open(json_file, "r") as f:json_data = json.load(f)return COCOJson.from_dict(json_data)
Step 11: Write Helpers for COCOUtility
This is a static method within the COCOJsonUtility class. It takes a COCOJson instance and an image_id as input, and returns a list of COCOAnnotation instances associated with the specified image_id.
class COCOJsonUtility:@staticmethoddef get_annotations_by_image_id(coco_data: COCOJson, image_id: int) -> List[COCOAnnotation]:return [annotation for annotation in coco_data.annotations if annotation.image_id == image_id]
Step 12: Define Static Method to Take COCOJson and Image Path
It takes a COCOJson instance and an image_path as input, and returns a list of COCOAnnotation instances associated with the specified image file path
@staticmethoddef get_annotations_by_image_path(coco_data: COCOJson, image_path: str) -> Optional[List[COCOAnnotation]]:image = COCOJsonUtility.get_image_by_path(coco_data, image_path)if image:return COCOJsonUtility.get_annotations_by_image_id(coco_data, image.id)else:return None@staticmethoddef get_image_by_path(coco_data: COCOJson, image_path: str) -> Optional[COCOImage]:for image in coco_data.images:if image.file_name == image_path:return imagereturn None
Step 13: Define Static Method to Take COCOAnnotation
This static method within the COCOJsonUtility class takes a list of COCOAnnotation instances as input and converts them into a Detections instance.
@staticmethoddef annotations2detections(annotations: List[COCOAnnotation]) -> Detections:class_id, xyxy = [], []for annotation in annotations:x_min, y_min, width, height = annotation.bboxclass_id.append(annotation.category_id)xyxy.append([x_min,y_min,x_min + width,y_min + height])return Detections(xyxy=np.array(xyxy, dtype=int),class_id=np.array(class_id, dtype=int))
Step 14: Load the COCO Annotations
import osDATA_SET_SUBDIRECTORY = "test"ANNOTATIONS_FILE_NAME = "_annotations.coco.json"IMAGES_DIRECTORY_PATH = os.path.join(dataset.location, DATA_SET_SUBDIRECTORY)ANNOTATIONS_FILE_PATH = os.path.join(dataset.location, DATA_SET_SUBDIRECTORY, ANNOTATIONS_FILE_NAME)coco_data = load_coco_json(json_file=ANNOTATIONS_FILE_PATH)CLASSES = [category.namefor categoryin coco_data.categoriesif category.supercategory != 'none']IMAGES = [image.file_namefor imagein coco_data.images]
Step 15: Set Random Seed
import randomrandom.seed(10)mask_predictor = SamPredictor(sam)
Step 16: Loop Through Multiple Images
for image_name in IMAGES:image_path = os.path.join(dataset.location, DATA_SET_SUBDIRECTORY, image_name)# Load dataset annotationsannotations = COCOJsonUtility.get_annotations_by_image_path(coco_data=coco_data, image_path=image_name)ground_truth = COCOJsonUtility.annotations2detections(annotations=annotations)# Adjust class idsground_truth.class_id = ground_truth.class_id - 1# Load imageimage_bgr = cv2.imread(image_path)image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)
Step 17: Initiate Annotator
# Initiate annotatorbounding_box_annotator = sv.BoundingBoxAnnotator(color=sv.Color.red(), color_lookup=sv.ColorLookup.INDEX)mask_annotator = sv.MaskAnnotator(color=sv.Color.red(), color_lookup=sv.ColorLookup.INDEX)# Annotate ground truthannotated_frame_ground_truth = bounding_box_annotator.annotate(scene=image_bgr.copy(), detections=ground_truth)
Step 18: Run SAM Inference
We run SAM inferences and observe few of the results.
# Run SAM inferencemask_predictor.set_image(image_rgb)masks, scores, logits = mask_predictor.predict(box=ground_truth.xyxy[0],multimask_output=True)detections = sv.Detections(xyxy=sv.mask_to_xyxy(masks=masks),mask=masks)detections = detections[detections.area == np.max(detections.area)]annotated_image = mask_annotator.annotate(scene=image_bgr.copy(), detections=detections)sv.plot_images_grid(images=[annotated_frame_ground_truth, annotated_image],grid_size=(1, 2),titles=['source image', 'segmented image'])

The left part of the image shows the original MRI scan, which appears to be a cross-sectional view of the brain. In this image, there is a red rectangular bounding box drawn around a specific region of interest, indicating the presence of a tumor or lesion.
The right part of the image shows the result of applying the SAM model to the original MRI scan. The model has successfully segmented and highlighted the tumor or lesion region within the brain, depicted in red. This segmentation process enables the precise delineation and isolation of the region of interest from the surrounding brain structures.
Step 19: Initialize W&B
!pip install wandbimport wandbfrom PIL import Image# Initialize W&Bwandb.login()run = wandb.init(project="segment-anything", name="sam-multiple-images")def mask_to_image(mask):"""Convert a binary mask to an RGB image."""mask = np.repeat(mask[:, :, np.newaxis], 3, axis=2)mask = mask.astype(np.uint8) * 255return Image.fromarray(mask)
Step 20: Loop Through Multiple Images to Get Visualization on W&B
# Loop through multiple imagesfor image_name in IMAGES:image_path = os.path.join(dataset.location, DATA_SET_SUBDIRECTORY, image_name)# Load dataset annotationsannotations = COCOJsonUtility.get_annotations_by_image_path(coco_data=coco_data, image_path=image_name)ground_truth = COCOJsonUtility.annotations2detections(annotations=annotations)# Adjust class idsground_truth.class_id = ground_truth.class_id - 1
Step 21: Load the Image on W&B
# Load imageimage_bgr = cv2.imread(image_path)image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)# Run SAM inferencemask_predictor.set_image(image_rgb)masks, scores, logits = mask_predictor.predict(box=ground_truth.xyxy[0],multimask_output=True)detections = sv.Detections(xyxy=sv.mask_to_xyxy(masks=masks),mask=masks)detections = detections[detections.area == np.max(detections.area)]
Step 22: Annotate Image on W&B
# Annotate imagesbounding_box_annotator = sv.BoundingBoxAnnotator(color=sv.Color.red(), color_lookup=sv.ColorLookup.INDEX)mask_annotator = sv.MaskAnnotator(color=sv.Color.red(), color_lookup=sv.ColorLookup.INDEX)annotated_frame_ground_truth = bounding_box_annotator.annotate(scene=image_bgr.copy(), detections=ground_truth)annotated_image = mask_annotator.annotate(scene=image_bgr.copy(), detections=detections)# Convert annotated images to PIL Images for loggingsource_image_pil = Image.fromarray(cv2.cvtColor(annotated_frame_ground_truth, cv2.COLOR_BGR2RGB))segmented_image_pil = Image.fromarray(cv2.cvtColor(annotated_image, cv2.COLOR_BGR2RGB))
Step 23: Log Images and Masks on W&B
# Log images and masks to W&Bwandb.log({"source_image": wandb.Image(source_image_pil, caption=f"Source Image with Ground Truth ({image_name})"),"segmented_image": wandb.Image(segmented_image_pil, caption=f"Segmented Image by SAM ({image_name})"),"masks": [wandb.Image(mask_to_image(mask), caption=f"Mask {i} for {image_name}") for i, mask in enumerate(masks)],"scores": {f"score_{i}": score for i, score in enumerate(scores)} # Log scores for each mask})# Finish the W&B runwandb.finish()

The image shows three line charts tracking different evaluation metrics. The first two charts, "score_1" and "score_2", display metric values ranging from around 0.95 to 0.98, representing performance scores like accuracy or precision, where higher values are better. The third chart, "score_0", has a slightly different range of approximately 0.93 to 0.98 and it represents a loss or error metric, where lower values indicate better performance. In contrast, the images are generated masks or segmentations for specific input images or data samples.
Conclusion
In conclusion, the Segment Anything Model (SAM) will revolutionize medical imaging by providing accurate and efficient segmentation capabilities. If SAM is integrated with W&B, researchers can leverage its powerful features for experiment tracking, model versioning, and performance monitoring. This combination enables seamless monitoring and optimization of SAM's performance in critical medical applications like tumor detection and lesion segmentation. As AI technologies continue to advance, the synergy between SAM and W&B holds immense potential for further enhancing medical imaging analysis, ultimately leading to improved patient outcomes and more effective disease management.
Add a comment
Tags: Community Posts, Articles
Iterate on AI agents and models faster. Try Weights & Biases today.