Reproducible Models with Weights & Biases

How Weights & Biases optimised my attempt for the ML Reproducibility Challenge 2020. Made by Diganta Misra using Weights & Biases
Diganta Misra

Why is reproducibility important?

Research is often reduced to a single defining characteristic: novelty. As Albert Szent-Györgyi once said, “Research is to see what everybody else has seen, and to think what nobody else has thought.”
However, research (and science in general) is a dull blade if it can't be reproduced. As the field of deep learning matures, its research needs to stand the test of time and prove itself to be transparent and reproducible.
The ML Reproducibility Challenge is an initiative to corroborate confidence in the research being published at top conferences and journals. The challenge: Reproduce a paper and validate its central claim.
Influenced by my passion for academic research and open science, I took an attempt at the ML Reproducibility Challenge 2020. I picked a paper from the CVPR 2020 conference titled "ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks" by Wang et al. (2020).
The paper proposes a novel form of channel attention mechanism known as Efficient Channel Attention (ECA), which can be plugged into standard deep convolutional neural network architectures to obtain a significant performance boost at the cost of an extremely small computational overhead. In this report, I show how I used Weights & Biases to reproduce several aspects of the paper and validate the claims presented in the paper.

W&B as a workspace

Reproducing a research paper can be a tall order and using the right tools and workspace matters a lot. For my reproducibility challenge, I used the Weights & Biases workspace to manage all my experiments and run-related data. Weights & Biases is a set of tools for deep learning practitioners and researchers.
In my opinion, Weights & Biases is a catalyst for transparency in research and upholds the core values of open science by empowering users with finely crafted tools to reduce friction in experimentation, allowing researchers to focus more on their ideas and less on logging.
To view the codebase for the results and experiments described below please visit my repository on GitHub.

Just Track It!

I first ran ResNet-18 models equipped with ECA (Efficient Channel Attention), CBAM (Convolutional Block Attention Module), SE (Squeeze-and-Excitation), or Triplet Attention, along with a variant with no additional attention mechanism, on the CIFAR-10 dataset. Each variant was run for a total of five times to provide insight into the consistency and stability of the performance of the models.
Weights & Biases makes this super easy to do. For example, I used the grouping runs feature to visualise the mean and standard deviation of each group of training runs. To do the same use the following code snippet:
num_runs = 5 #Number of runs for each group# Looping through the groupfor i in range(0, num_runs): wandb.init(project='My Reproducibility Project', group='My group') # wandb.log({'Val_Acc':val_acc, 'Val_Loss':loss}) #Log values to WandB
For more information, here is the documentation for Grouping and here is the code that I used to obtain the graph below:
Each group in the above graphs are named with the following convention: . Thus, TripletCIFAR refers to a ResNet-18 with Triplet Attention. Note: ResCIFAR denotes a ResNet-18 with no added attention mechanism, and ECCIFAR is the ResNet-18 model with ECA added to it.
As shown in the accuracy graph, the ResNet-18 equipped with ECA obtained the highest Top-1 accuracy of 78.07\%, which validates the performance increment showcased in the paper. You can hover your mouse over any point on the above graphs to get the mean value of the group of runs along with the min and max of that group at that epoch. This is represented in the format \text{mean} \ \sigma \ \text{std\_dev} \ (\text{min, max}).

Sweep the board!

Deep learning hyper-parameters can boost a model's performance over other models, or lead it to its grave.
Hyper-parameters are often selected through a mysterious combination of intuition, prior literature and hypotheses, and a small sprinkle of luck. However, the W&B Sweeps tool offers a transparent way of analysing how different combinations of hyper-parameters perform.
I used Sweeps to investigate the effect of channel attention module (ECA, CBAM, SE, Triplet), optimiser (Adam, SGD), and batch size (64, 128, 256) on the loss value.
  1. Create a config.yaml file containing the hyper-parameter settings:
program: train_cifar.pymethod: bayesmetric: name: loss goal: minimizeparameters: att: values: ["ECA", "CBAM", "SE", "Triplet"] optimizer: values: ["adam", "sgd"] batch_size: values: [64, 128, 256]
  1. Initialize the sweep with the config.yaml file and generate a sweep ID:
wandb sweep config.yaml
  1. Use the sweep ID to run the sweep:
wandb agent {insert sweep ID}
For more details please refer to the Sweeps documentation or look at the jupyter notebook I used to create the following graph:
The combination of hyper-parameters that obtained the lowest loss (0.57) for my ResNet model was ECA combined with SGD optimiser on a batch size of 128. I also observed that most of the combinations using ECA obtained lower loss than the other attention variants. This was confirmed by the parameter importance chart where ECA is the parameter with the highest importance and most negative correlation to loss.

Media? Media!

In this section, I deconstruct the training process of the Mask R-CNN model equipped with a ECA-Net 50 backbone by visualising the bounding box and segmentation map results of a few sample images from the MS-COCO 2017 dataset at each of the 12 epochs. This allows me to understand and interpret the progression of training of the model, along with its per epoch stability.
If you click the gear icon in the top left of the panel shown below, you can use the slider to visualise the model's results at each epoch.
I used the MMDetection framework to train the MS-COCO. You can use the following snippet in your inference pipeline to log results in the same way:
wandb.init(project = 'My Reproducibility Project')for checkpoints in checkpts: #checkpts is the directory containing each epoch weights log_img = [] model = #define model here for i in img_list: #img_list contains the path to the images we obtained the results for result = ##inference pass of the model for the image i log_img.append(wandb.Image(result)) wandb.log({"Segmentation and BBox Results": log_img})
For the full inference pipeline look at my inference notebook. There are many more additional features available for Bounding Box and Segmentation Maps logging in Weights & Biases.

More Features

Weights & Biases is a collection of tools that make research more transparent and engaging. The power of logging literally anything and everything at the expense of a few lines of code is probably one of the most powerful weapon a researcher can wield.
Some of the more advanced features that could be useful for more researchers include:
  1. Artifacts: Probably my favourite out of the lot, Artifacts allows you to store your model checkpoints and dataset versions. As the cherry on top, Artifacts even allows you to visualise it all in a computational graph in the dashboard.
  2. DSViz: Currently in development, DSViz is an amazing tool that breathes fresh air into Exploratory Data Analysis (EDA). DSViz gives you complete coverage of your dataset, which allows you to inspect a model's evaluation on samples and debug them more efficiently.
  3. Custom Charts: Custom Charts lets you log even the most complicated graphs. Designed and custom tailored for advanced insights, log anything from ROC curves to attention maps.
  4. Hardware Metrics: Weights & Biases automatically logs system and hardware metrics in real time for every run, which allows you to analyse model complexity in terms of GPU usage or memory allocated.
The beauty of deep learning is in exploration; the more you dive into it, the more vibrant it gets. Weights & Biases is similarly in parallel, making your deep learning and machine learning research projects stand out while allowing complete transparency and reproducibility.


The fields of deep and machine learning are still developing, and have a lot of room for growth. Initiatives like the ML Reproducibility Challenge serve as a way to keep track of the fast-paced research being conducted and published at top conferences, and tools like Weights & Biases serve as an important reminder that research and appropriate tools go together in the quest for scientific progress.
In my opinion, Weights & Biases is a tool that should be in every researcher's kit.
Thank You!