Skip to main content

How to track all your experiments using Microsoft Excel?

As part of this report, I am going to show you how to use MS Excel (and maybe a better way too) to track all your experiments.
Created on February 15|Last edited on April 7


Introduction

As machine learning engineers we have to run hundreds of experiments to make sure that our final model is robust and generalizes to the test set.
A simple answer to a new idea such as "Will X work?" is to try 'X' out and run a small experiment!
That means we have to run quite a number of experiments before coming to the final solution. So how should we keep a track of all these experiments? Well, by using MS Excel of course!
Let's take the Melanoma example which is a part of the project, where this report sits, and start using MS Excel to track our experiments.
For an introduction to the problem statement - you can also refer to the report "How to build a robust medical model using Weights and Biases."

A list of Experiments

Since the dataset is highly skewed - around 98% of the total cases are "benign" whereas only about 2% are "malign", there are many things we'd like to try to get a high score when it comes to model training:
  1. There are many losses to choose from - is the typical Binary Cross-Entropy loss going to work or would we need to try focal loss?
  2. Would a weighted loss work better for our case?
  3. Are models pre-trained with ImageNet going to be helpful or should we train the models from scratch?
  4. How should we pre-process or resize the images?
  5. Would we need to try some kind of specific preprocessing such as "color constancy" that might give a boost to our scores?
  6. How should we use the metadata such as "gender", "age", "sex" in our models? Would it even be beneficial?
  7. What data augmentations should we add for our model training to make them robust?
  8. With hundreds of model architectures to choose from, which ones would work best for Melanoma classification?
  9. Is there another external dataset that we could perhaps pre-train our models on before finetuning on Melanoma classification?
  10. What learning rate should we use for our model training?
  11. What kind of learning rate schedule would work best?
  12. What should be the training batch size in accordance with the learning rate?
  13. Should we use gradient accumulation?
  14. Which parameters correlate best with the final validation "AUC" metric?
  15. What image size should we use for model training?
Now that we know the kind of experiments that we want to run, let's see how we can utilize MS Excel and start keeping a track of all our experiments.

Model Tracking using Microsoft Excel

Step-1: Create an empty excel sheet with all possible experiments as column names

Since we have about 15 different experiments that we want to possibly run, let's create an excel sheet with these as column names.
Fig-1: Empty excel sheet with all experiments as column names

Step-2: Start with the best possible initial values for each of these columns and train model

We call the first experiment "Experiment-0" and start with some initial values. We note that this gives us a validation AUC of ~0.63 which is not great. So, we will need to try more experiments.
Fig-2: First experiment with initial hyperparameter values

Step-3: Change one column value at a time and see if makes a difference

The reason why we want to change one column at a time and keep everything else the same is because we want to see if that particular experiment is really crucial to model performance or not.
So, let's try Binary Cross-Entropy (BCE) loss instead of Weighted Focal loss.
We got a slightly improved AUC score, so for all our next experiments, we will be using BCE loss.
Fig-3: Second experiment with BCE loss instead of focal loss

Step-4: Change one value at a time and run a lot of experiments and color code AUC metric

Now, you get the point. We will continue to change one value at a time. And then run a lot of experiments over a period of a few days/weeks. Towards the end, we should get one massive excel sheet with a lot of experiments and validation AUC scores.
Fig-4: After running multiple experiments by changing one value at a time

Summary of tracking experiments with Microsoft Excel

I hope by now you get the gist of what it looks like to track experiments using Microsoft Excel. Can you imagine the process?
  1. Tweak some values
  2. Quick off a training run with the updated values
  3. Come back to the training run after a few hours to note down the final validation metric value at the end of the last epoch
  4. Go back to excel - update the validation AUC
  5. Kick-off another training run with updated values and go back to step-2, repeat until you have a good working model
Does this not make you wonder if there is a better way to track all your experiments? Is Microsoft Excel really the right tool for experiment tracking?
Here are some of the few disadvantages that I can think off when you use Microsoft Excel for experiment tracking:
  1. You don't see progress per epoch: Towards the end, we only see one final value of the validation metric. What if the validation metric had peaked before the final epoch?
  2. Typing Errors: Since there is a human in the loop, one could very easily enter incorrect values and then, later on, be confused by looking at the excel sheet.
  3. Hard to distill results: In the example above, I have listed only about 6 experiments, but when I was really participating in the Melanoma competition, I had run over 275 different experiments over a period of 8 weeks. It was very difficult for me to distill the information and I was left wondering "which experiments performed the best and why".

A "cooler" way to track your experiments

Have you ever used an experiment tracking tool that is specially designed for tracking your experiments such as Weights and Biases (W&B)?
Before telling you how to use W&B for experiment tracking let me share some benefits that might encourage you to make the move:
  1. Part of a bigger ecosystem: When using W&B for experiment tracking, you'll be part of a bigger ecosystem that can do more than just experiment tracking - you will also be able to run hyperparameter sweeps, store model weights in the cloud, write reports, and share results with your teammates!
  2. Everything in one place: The best part? Everything is in one place. That is all your experiments, all results, model weights, notes, reports can all be viewed as a single dashboard - this makes it much easier to distill information!
  3. Share results and experiments with your teammates: By using W&B, you can also share your experiments with your teammates. In fact, you and your colleagues could be working on the same project, and this will give you visibility on what experiments your colleagues are running too!
  4. Track progress per epoch: Don't just get the final validation metric scores, but you will also be able to see how metric scores change per epoch!
  5. It's all free: W&B is free to start and you don't need to pay any money to get started with W&B to track all your experiments.
  6. Store datasets as W&B tables: Yes, that's right! You can also store your training and validation datasets with W&B and play around with them too to understand more about your data. This means being able to log images, audio, videos, or even a simple Pandas DataFrame (if that's what you prefer).
  7. It's all automated: Just by adding a few lines of code to your existing training code, you can start using W&B and the tool will keep a track of everything for you without having to manually input numbers. This means there are no errors when looking at validation metric scores and it also means that using W&B as a tool for experiment tracking makes it great for audit purposes!

Using Weights and Biases for experiment tracking

Want to see what the same Melanoma project looks like when you use W&B for experiment tracking?
Fig-5: Sample dashboard on Melanoma project when using W&B for experiment tracking
There's a lot that this tool can do! But, we'll take it slow. If you want to see a high-level report that showcases the powers of this tool - have a quick read of "How to build a robust medical model using Weights and Biases".



A simple chart such as above that tracks the validation metric scores per epoch over time with all the experiment names - can be so very beneficial! As humans, it is easier for us to distill information when it's presented visually than as part of an excel table.
Just by hovering over the chart, we can see that the run "ethereal-blaze-106" performs the best. Next step? Just go over to the run and see everything in one place.
Fig-7: Model artifacts, experiment config, environment requirements.txt file, code files are all in one place

Fig-8: All training and validation data are also part of the run to make it easier to play with your datasets

Fig-9: See what the model learns using "W&B embedding projector" and also store OOF preds

Fig-10: Keep track of validation metric and loss and see how it changes over time per epoch
And this is not really it! There's so much more than this tool can do. Below I list a few resources that will help you get started with W&B and use them for all your future experiments.

Get started with W&B

  1. Weights and Biases Quickstart: The quickest way to get started with W&B is to use the quickstart. It's in PyTorch and should get you started with W&B very quickly.
  2. How to build a robust medical model using Weights and Biases: This next report features model artifacts, sweeps, experiment tracking, W&B embedding projector, and W&B tables too!
  3. ResNet Strikes Back: A Training Procedure in TIMM: A tutorial on how to integrate W&B in your research.
  4. Tracking CO2 Emissions of Your Deep Learning Models with CodeCarbon and Weights & Biases: A report that can help make all your future models environment-friendly! 
  5. How Weights and Biases Can Help with Audits & Regulatory Guidelines: How using W&B for experiment tracking can prepare you for all your future audits!
  6. How Weights & Biases and MS Fairlearn can help deal with Model and Dataset Bias: Use W&B to learn more about your model and dataset bias.
  7. Interpret any PyTorch Model Using W&B Embedding Projector: Use W&B's powerful embedding projector to visualize what the model has learned during training.

Lavanya Shukla
Lavanya Shukla •  
"cooler" should we replace cooler with something else here? "more efficient" "scalable"
2 replies