Deep Learning and MLOps for Health Care: A look into MedSAM

Created on January 24|Last edited on October 9
Comment
﻿
Introduction: MedSAM's practical applications﻿Link to Nature﻿
﻿Link to Arxiv﻿
💡
﻿
In the bustling corridors of a modern hospital, radiologists and clinicians are often seen poring over a myriad of medical images — from X-rays to MRIs — seeking to unravel the mysteries hidden within these complex visuals. Each image is a puzzle, where accurately identifying and delineating regions of interest (ROIs) can be the key to a correct diagnosis, an effective treatment plan, or even a life-saving intervention. This is the world of medical image segmentation, a domain where precision meets the critical needs of patient care.
Traditionally, this segmentation has been a manual and labor-intensive process, requiring hours of expert attention for each image. The advent of semi-automatic and fully automatic segmentation methods offered a glimmer of hope, promising to reduce the time and labor involved. However, these methods, until recently, have been plagued by a lack of versatility and consistency, especially when faced with varying tasks or imaging modalities.
Enter MedSAM, a groundbreaking deep learning model that stands at the forefront of a new era in medical image analysis. MedSAM, or Medical Segment Anything Model, is a universal, foundation model designed to bridge the gap in the current landscape of medical image segmentation. It leverages the power of a large-scale dataset encompassing over 1.57 million image-mask pairs, covering an extensive range of more than 30 cancer types and 10 imaging modalities.
What sets MedSAM apart is its exceptional ability to adapt and perform across a diverse spectrum of segmentation tasks, demonstrating superior accuracy and robustness compared to traditional modality-specific models. This leap forward in technology is not just a theoretical advancement; it has tangible, real-world implications. With MedSAM, clinicians can look forward to more accurate diagnoses, personalized treatment plans, and efficient monitoring of diseases — all achieved with unprecedented speed and precision.
The development of MedSAM is inspired by the strides made in natural image segmentation, particularly the segment anything model (SAM) and its counterparts. These models showed remarkable versatility across various segmentation tasks but fell short when applied to the nuanced and complex world of medical images. MedSAM, through extensive fine-tuning and evaluation, overcomes these challenges, offering a refined model specifically tailored for the medical domain.
Through rigorous evaluation on 86 internal and 146 external validation tasks, MedSAM has consistently outshined state-of-the-art segmentation models, proving its worth as a versatile, high-performing tool in medical image analysis. 
A recap on SAM: Segment Anything
Segment Anything Model (SAM)
A general-purpose image segmentation model called Segment Anything Model (SAM) was just released from Meta AI. Here's my take.
﻿
﻿
Task and Pre-training:SAM's core is the promptable segmentation task, drawing from NLP's prompt-based learning. It involves returning a valid segmentation mask for various prompts, including points, boxes, masks, or text. The pre-training algorithm adapts interactive segmentation techniques, simulating sequences of prompts for each image and assessing the model's ability to predict valid masks for ambiguous prompts.
Model Architecture:Image Encoder: SAM employs a Vision Transformer (ViT) pre-trained with Masked Autoencoders (MAE), adapted for high-resolution input processing. It runs once per image to create a detailed image embedding.
How Do Vision Transformers Work ?
An in-depth breakdown of 'How Do Vision Transformers Work?' by Namuk Park and Songkuk Kim.
﻿
Prompt Encoder: This component handles different types of prompts. Sparse prompts (points, boxes, text) use positional encodings and learned embeddings, while dense prompts (masks) are embedded using convolutions. Text prompts are processed with a CLIP-based text encoder.
Mask Decoder: It efficiently maps the image and prompt embeddings to a segmentation mask using a modified Transformer decoder block. This block incorporates prompt self-attention and cross-attention mechanisms and concludes with a dynamic mask prediction head for computing mask probabilities.
Ambiguity Resolution and Efficiency:SAM can generate multiple valid masks for a single, ambiguous prompt, each with a confidence score. It achieves this through a model modification that allows the prediction of several output masks. The design prioritizes efficiency: with precomputed image embeddings, the prompt encoder and mask decoder can operate in ∼50ms on standard computing environments, facilitating real-time interactions.
Training and Losses:SAM's training integrates a blend of geometric and text-based prompts. The model is supervised using a linear combination of focal loss and dice loss, simulating an interactive setup with random prompt sampling. This approach ensures seamless integration into diverse segmentation tasks, allowing SAM to adapt to various practical segmentation scenarios via prompt engineering.
﻿
Focal Loss is primarily used to address class imbalance in object detection tasks. It was introduced in the context of training deep neural networks for object detection in scenarios where there is a significant imbalance between the background and foreground classes.
Problem Addressed: In many object detection tasks, the majority of the anchor boxes are negative (representing background), leading to an imbalance between the background and object classes. This imbalance can result in the model being overwhelmed by the sheer number of easy negatives, causing it to perform poorly on detecting actual objects.
How It Works: Focal Loss modifies the standard cross-entropy loss by adding a factor that reduces the loss for well-classified examples. The idea is to focus the model's training on hard negatives. It is computed as:
﻿FL(pt)=−αt(1−pt)γlog⁡(pt)FL(p_t) = -\alpha_t (1 - p_t)^\gamma \log(p_t)
FL(pt​)=−αt​(1−pt​)γlog(pt​)﻿﻿
  Where:
﻿(pt)(p_t)(pt​)﻿ is the model's estimated probability for each class.
﻿(alphat)( alpha_t )(alphat​)﻿ is a balancing parameter.
﻿(gamma)(gamma)(gamma)﻿ is a focusing parameter that adjusts the rate at which easy examples are down-weighted. The higher the (gamma)(gamma)(gamma)﻿, the more focus on hard, misclassified examples.
﻿
Dice Loss is a loss function often used for segmentation tasks, particularly in medical image processing. It's effective for data with a high imbalance between the object and background pixels.
Problem Addressed: In segmentation, especially medical imaging, the region of interest (like a tumor in an MRI scan) often occupies a much smaller area compared to the background. Standard loss functions like cross-entropy may not perform well due to this imbalance.
How It Works: Dice Loss is based on the Dice Coefficient, which is a measure of overlap between two samples. It is particularly useful for measuring the similarity between the predicted segmentation and the ground truth. The Dice Coefficient is calculated as:
﻿Dice Coefficient=2∣X∩Y∣∣X∣+∣Y∣\text{Dice Coefficient} = \frac{2 |X \cap Y|}{|X| + |Y|}
Dice Coefficient=∣X∣+∣Y∣2∣X∩Y∣​﻿﻿
Where (X)( X )(X)﻿ is the predicted set of pixels and (Y)(Y)(Y)﻿ is the ground truth. The Dice Loss is then formulated as:
﻿Dice Loss=1−Dice Coefficient\text{Dice Loss} = 1 - \text{Dice Coefficient}
Dice Loss=1−Dice Coefficient﻿﻿
This loss function works well for segmentation because it directly maximizes the overlap between the predicted segmentation and the ground truth, making it less sensitive to class imbalance.
﻿
Focal Loss is designed to address the class imbalance by focusing more on difficult, misclassified cases in classification tasks, 
Dice Loss is used in segmentation tasks to maximize the overlap between the prediction and the ground truth in scenarios with a high imbalance.
Teaching SAM Medicine: MedSAM
﻿
TL;DR - Results at a glance:
﻿
Dataset Composition:
MedSAM trained on a dataset of 1,090,486 medical image-mask pairs.
Covered 15 imaging modalities and over 30 cancer types.
Predominant modalities: CT, MRI, and endoscopy, with inclusion of ultrasound, pathology, fundus, dermoscopy, mammography, and OCT.
Internal Validation Results:
Focused on 12 representative segmentation tasks.
Evaluated using Dice Similarity Coefficient (DSC):
     - Median DSC for key tasks: Intracranial hemorrhage CT (94.0%), Glioma MR T1 (94.4%), Pneumothorax CXR (81.5%), Polyp endoscopy (98.4%).
MedSAM surpassed U-Net models' performance in most tasks.
Comparable performance with U-Net in tasks with clear boundaries, e.g., skin cancer segmentation (95.2% vs 95.1% for U-Net).
External Validation Results:
Involved over 30 new segmentation tasks from unseen datasets.
MedSAM showed superior generalization, outperforming both SAM and U-Net specialist models in diverse tasks.
   - Notable DSC improvements: Nasopharynx cancer segmentation (90.3%), outperforming SAM by 53.3% and U-Net by 24.5%.
   - MedSAM outperformed in unseen modalities like abdomen T1 Inphase and Outphase by 3-7% over SAM and U-Net.
Quantitative Analysis:
MedSAM demonstrated higher precision in tumor burden quantification, with a Pearson correlation of 0.99 compared to expert evaluations.
In prostate segmentation, MedSAM's performance matched or exceeded that of six human experts.
Qualitative Observations:
MedSAM effectively segmented objects with weak or missing boundaries.
Superior performance in segmenting challenging targets like liver and cervical cancers.
Saliency maps analysis revealed MedSAM’s features rich in semantic information relevant to anatomical structures.
What makes MedSAM uniquelorum ipsum
Training our own MedSAM for Breast Cancer Detection
Constructing an end to end MLOps pipelinelorum ipsum
﻿https://github.com/wandb/smle-machine/tree/main/src/training/segment-anything﻿
Training Dataset﻿
project("wandb-smle", "segment-anything").artifact("nielsr-breast-cancer-train")
nielsr-breast-cancer-trainVersion 0
All Versions
Aliases
deployment_red
latest
production
Versions
v5
v4
v3
v2
v1
v0
VersionMetadataUsageFilesLineage
Direct lineage view
Expanded graph
Include generated artifacts
Some nodes are concealed in this view - Break out items to reveal more.
Artifact - hf_dataset
nielsr-breast-cancer-train:v0
Runs
6
deep-glitter-73
load_dataset
lunar-feather-77
load_dataset
dandy-microwave-86
load_dataset
efficient-universe-21
load_dataset
apricot-vortex-23
load_dataset
youthful-dragon-24
load_dataset
Runs
90
stilted-waterfall-282
training
rose-sponge-26
training
rosy-darkness-23
training
colorful-water-176
training
glowing-sweep-1
training
frosty-bee-171
training
apricot-surf-169
training
stellar-lion-165
training
daily-wildflower-163
training
wandering-butterfly-160
training
vital-galaxy-158
training
glorious-violet-154
training
lucky-lion-152
training
electric-thunder-150
training
unique-cloud-146
training
segment-anything
training
azure-plant-141
training
dutiful-dew-140
training
segment-anything
training
upbeat-sweep-49
training
misty-sweep-48
training
stilted-sweep-47
training
warm-sweep-46
training
ancient-sweep-45
training
magic-sweep-44
training
rural-sweep-43
training
azure-sweep-42
training
dulcet-sweep-41
training
lemon-sweep-40
training
usual-sweep-39
training
devoted-sweep-38
training
drawn-sweep-37
training
autumn-sweep-36
training
sweet-sweep-35
training
dainty-sweep-34
training
stilted-sweep-33
training
fresh-sweep-32
training
curious-frost-117
training
still-sweep-31
training
honest-sweep-30
training
toasty-sweep-29
training
serene-sweep-28
training
eager-sweep-27
training
northern-sweep-26
training
fearless-sweep-25
training
rose-sweep-24
training
electric-sweep-23
training
jumping-sweep-22
training
olive-sweep-21
training
woven-sweep-20
training
vital-sweep-19
training
exalted-sweep-18
training
worldly-sweep-17
training
peachy-sweep-16
training
chocolate-sweep-15
training
honest-sweep-14
training
revived-sweep-13
training
valiant-sweep-12
training
super-sweep-11
training
youthful-sweep-10
training
expert-sweep-9
training
gallant-sweep-8
training
genial-sweep-7
training
leafy-sweep-6
training
fearless-sweep-5
training
deep-sweep-4
training
sandy-sweep-2
training
fanciful-sweep-3
training
quiet-sweep-1
training
winter-serenity-56
training
true-voice-53
training
faithful-silence-47
training
rosy-firebrand-35
training
whole-frost-34
training
serene-star-33
training
wobbly-field-32
training
icy-jazz-31
training
brisk-jazz-30
training
scarlet-valley-29
training
honest-grass-28
training
dark-wind-27
training
olive-silence-26
training
fluent-sea-25
training
olive-cosmos-19
training
bright-water-96
training
curious-dew-95
training
youthful-sponge-94
training
stellar-firefly-89
training
floral-dew-88
training
resilient-sunset-87
training
React Flow
Training Results - Logged Metrics﻿
Run set166
﻿
Automatic GPU Tracking
Validation Dataset﻿
project("wandb-smle", "segment-anything").artifact("nielsr-breast-cancer-val")
nielsr-breast-cancer-valVersion 0
All Versions
Aliases
latest
Versions
v5
v4
v3
v2
v1
v0
VersionMetadataUsageFilesLineage
Direct lineage view
Expanded graph
Include generated artifacts
Some nodes are concealed in this view - Break out items to reveal more.
Artifact - hf_dataset
nielsr-breast-cancer-val:v0
Runs
6
deep-glitter-73
load_dataset
lunar-feather-77
load_dataset
dandy-microwave-86
load_dataset
efficient-universe-21
load_dataset
apricot-vortex-23
load_dataset
youthful-dragon-24
load_dataset
Runs
99
stilted-waterfall-282
training
rose-sponge-26
training
rosy-darkness-23
training
colorful-water-176
training
glowing-sweep-1
training
drawn-surf-172
evaluation
frosty-bee-171
training
apricot-surf-169
training
stellar-lion-165
training
daily-wildflower-163
training
wandering-butterfly-160
training
dulcet-frost-159
evaluation
vital-galaxy-158
training
glorious-violet-154
training
lucky-lion-152
training
stellar-sky-151
evaluation
electric-thunder-150
training
sweet-bush-148
evaluation
blooming-spaceship-147
evaluation
unique-cloud-146
training
segment-anything
training
azure-plant-141
training
dutiful-dew-140
training
segment-anything
training
upbeat-sweep-49
training
misty-sweep-48
training
stilted-sweep-47
training
warm-sweep-46
training
ancient-sweep-45
training
magic-sweep-44
training
rural-sweep-43
training
azure-sweep-42
training
dulcet-sweep-41
training
lemon-sweep-40
training
usual-sweep-39
training
devoted-sweep-38
training
drawn-sweep-37
training
autumn-sweep-36
training
sweet-sweep-35
training
dainty-sweep-34
training
stilted-sweep-33
training
fresh-sweep-32
training
curious-frost-117
training
still-sweep-31
training
honest-sweep-30
training
toasty-sweep-29
training
serene-sweep-28
training
eager-sweep-27
training
northern-sweep-26
training
fearless-sweep-25
training
rose-sweep-24
training
electric-sweep-23
training
jumping-sweep-22
training
olive-sweep-21
training
woven-sweep-20
training
vital-sweep-19
training
exalted-sweep-18
training
worldly-sweep-17
training
peachy-sweep-16
training
chocolate-sweep-15
training
honest-sweep-14
training
revived-sweep-13
training
valiant-sweep-12
training
super-sweep-11
training
youthful-sweep-10
training
expert-sweep-9
training
gallant-sweep-8
training
genial-sweep-7
training
leafy-sweep-6
training
fearless-sweep-5
training
deep-sweep-4
training
sandy-sweep-2
training
fanciful-sweep-3
training
quiet-sweep-1
training
winter-serenity-56
training
true-voice-53
training
faithful-silence-47
training
rosy-firebrand-35
training
whole-frost-34
training
serene-star-33
training
wobbly-field-32
training
icy-jazz-31
training
brisk-jazz-30
training
scarlet-valley-29
training
honest-grass-28
training
dark-wind-27
training
olive-silence-26
training
fluent-sea-25
training
resilient-fog-22
download_dataset
olive-cosmos-19
training
soft-forest-98
evaluation
bright-water-96
training
curious-dew-95
training
youthful-sponge-94
training
smooth-snow-92
evaluation
cosmic-forest-91
evaluation
stellar-firefly-89
training
floral-dew-88
training
resilient-sunset-87
training
React Flow
Validation Results - Logged Metrics﻿
Run set166
﻿
Qualitative Analysis via Tables﻿
Run set166
﻿
Hyperparameter Optimization via SweepsWe can use our parallel coordinates plot to help determine and select our best model for usage in downstream tasks or for deployment via our model registry
﻿
Run set166
﻿
﻿
Looking into our Model Registry﻿
Enter query expression﻿
Orchestrating our ML workflow - LaunchLaunch description
﻿
Automating our ML workflow - Webhooks and Github ActionsWalkthrough of GitHub actions
﻿
﻿
Add a comment