Skip to main content

Chest X-Rays

Exploring chest x-ray data and strategies for real-world long-tailed data
Created on April 13|Last edited on April 17

Introduction

Deep learning holds great promise for the medical field. A computer model can assist doctors with references or prioritization as they diagnose patients. When professional expertise is scarce, the model can make a best guess based on the statistics of all the patient history it has seen. In a concrete example from 2017, Pranav Rajpurkar et al from the Stanford ML Group developed CheXNet, a computer vision model which—in certain controlled contexts—diagnose pneumonia from chest x-rays more accurately than an average radiologist. This impressive result required the NIH Chest X-ray Dataset of 112,120 x-rays from 30,805 unique patients.

with disease labels To create these labels, the authors used Natural Language Processing to text-mine disease classifications from the associated radiological reports. The labels are expected to be >90% accurate and suitable for weakly-supervised learning.

Overview

Dataset: Random 5% sample of 112K+ anonymous chest x-rays

The National Institute of Health's Chest X-ray dataset is available from Kaggle. This code uses the provided random sample of 5% (5,606 images).

Experiments to try

Training on the natural, highly-skewed distribution saturates quickly. Naive convolutional models trained on the unbalanced data simply predict the most frequent class. How can we account for the long tail and improve accuracy?

  • balanced training: create a balanced split of the dataset (perhaps expanding to all 200GB?)
  • image cropping to exclude the annotations/edges
  • feed in extra metadata like age, gender for an embedding
  • multi-label classification scenario
  • facebook classifier balancing approach: repo and paper
  • use a MAML model with something like 100-shot learning?

How skewed is the data?

Hernia - 13 images Pneumonia - 62 images Fibrosis - 84 images Edema - 118 images Emphysema - 127 images Cardiomegaly - 141 images Pleural_Thickening - 176 images Consolidation - 226 images Pneumothorax - 271 images Mass - 284 images Nodule - 313 images Atelectasis - 508 images Effusion - 644 images Infiltration - 967 images No Finding - 3044 images

Section 3




Run set
49