Predicting Lung Disease with Binary Classification on the NIH Chest X-ray Dataset
In this report, we will perform binary classification on the NIH Chest X-ray dataset.
Created on October 13|Last edited on March 24
Comment
Overview
Machine learning with medical imagery has been a promising domain for quite a while now. In fact, many in the field think ML-centric diagnoses are a matter of “when” not “if.” But since the consequences of false negatives and false positives are so detrimental for patients, the industry and researchers in this field are still fairly tentative.
Chest X-rays, like most medical images, are fairly ideal from a data perspective. They’re fairly uniform in size and angle and many are publicly available (with personally-identifying information redacted, of course).
Today, we’re going to look at if we can leverage an NIH dataset of those images to predict lung disease diagnoses. Specifically, here, our output is a prediction about whether we’re looking at a normal lung or an abnormal lung.
Task Performed: Binary Classification
Input Type: Image
Output: Prediction score denoting either normal or abnormal lung.
Let’s dig in:
Dataset
NIH Chest X-ray Dataset is comprised of 112,120 X-ray images with 14 text-mined disease labels from 30,805 unique patients. The 14 diseases labels are Atelectasis, Cardiomegaly, Consolidation, Edema, Effusion, Emphysema, Fibrosis, Hernia, Infiltration, Mass, Nodule, Pleural Thickening, Pneumonia, Pneumothorax.
To create these labels, the authors used Natural Language Processing to text-mine disease classifications from the associated radiological reports. The labels are expected to be >90% accurate and suitable for weakly-supervised learning.
License and Attribution
- There are no restrictions on the use of the NIH chest x-ray images.
- The data is provided by the NIH Clinical Center and is available through the NIH download site: https://nihcc.app.box.com/v/ChestXray-NIHCC/folder/36938765345
- Wang, et al. “ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases” ArXiv:1705.02315 [cs.CV], May 2017. arXiv.org, https://arxiv.org/abs/1705.02315.
Model: v1
Problem Formulation
With this model, the intent is to predict a given X-ray image as either normal(no disease-associated) or abnormal(have one or more diseases). This model is thus capable of performing binary classification.
Intended Usecase
- Research: To further the research in the field of automatic Deep Learning-based “reading chest X-rays” for computer-aided diagnosis(CAD).
- Pretrained Weights: To provide pre-trained weights for downstream tasks involving X-ray images.
- Promote: To promote the use of model cards for reporting models.
Uses to avoid
- This version of the model is not to be used in production to determine if an X-ray image has any lung disease associated with it or not.
- This is not to be considered state of the art(SOTA).
Training Data
51759 sample of the NIH Chest X-ray dataset is either labeled with one or more diseases(multi-labels). The label for such samples is converted to 1.
The remaining samples are labeled No Finding. The NLP-based labeling technique used by the authors of the dataset could not associate any disease with these samples. The label for such samples is converted to 0.
20,000 training images, 5000 validation images, and 10,000 test images were used to train, validate, and test the model:v0.
Preprocessing: The original image size is (1024 x 1024) pixels. They are resized to (256 x 256) pixels. The resized images are scaled-down.

Figure 1: Samples from the training set for binary classification
Model Architecture
The output of the Global Max Pooling is passed through a relu activated Dense network with 512 units. It is followed by a dropout layer(drop rate of 0.2). The output layer is sigmoid activated.

Figure 2: Model architecture
Training related specifics
- Adam optimizer with a learning rate of 0.001 is used.
- Cross-entropy loss is used.
- Model is trained with early stopping.
Run set
1
Evaluation
Evaluation is done on the held-out test set. ROC Curve and test error rate are used as evaluation metrics.
Run set 2
3
Model Bias
The Data_Entry_2017_v2020.csv that comes with the NIH Chest X-ray contains class labels as well as patient data. The patient data provided are:
- Gender: Male or Female
- Age: Continuous value
No signal about the age or the gender was provided during training.
Bias Towards Gender
The model is evaluated on the male-only(blue) as well as the female-only(orange) subset of the test data.
Observations
- The model will give a better prediction for an X-ray belonging to the male category.
- This shows the imbalance in the training dataset in the context of gender.
- The bias is coming from the dataset.
Run set
2
Bias Towards Age Groups
The continuous ages are bucketed: [0, 10, 20, 30, 40, 50, 60, 70, 80, 90].
The model is evaluated for each bucket to learn about the model performance in each bucket.
Observation
- Lung related sickness should be commonplace for certain age groups(mid-adult range).
- The test error rate is high for the 0-20 age group which is acceptable.
- For age groups 70-90, the number of data samples would be less.
This can be better quantified through domain knowledge adaptation.
Run set
19
Downloads
Download model:v1
# initialize wandb runrun = wandb.init()# download model_nih_1.h5 as artifactartifact = run.use_artifact('wandb/model-card-NIH-Chest-X-ray-binary/model:latest')artifact_dir = artifact.download()# close the runrun.join()
Limitations
Dataset
- The dataset used for training is a small subset of the full dataset.
- The image labels are NLP extracted so there could be some erroneous labels. NLP labeling accuracy is estimated to be >90%.
Model
- For images such as X-rays that are not naturally occurring, vanilla convolutional neural network-based image classifiers are not sufficient.
Add a comment
Tags: Beginner, Computer Vision, Classification, Experiment, Panels, Plots, Kaggle, NIH X-ray, Health Care
Iterate on AI agents and models faster. Try Weights & Biases today.