Intro to MLOps: Machine Learning Experiment Tracking

Intro to MLOps: Machine Learning Experiment Tracking

Imagine you are trying to develop a recipe for the best chocolate chip cookies. After the first try, you might increase the amount of flour. One time, you might add more chocolate chips. Another time you might try it with some walnuts. In the end, you might have tried a dozen recipes, but which was the best? 

I’m sure you agree that taking notes during this process would be a good idea. You should probably write down the ingredients of each recipe and how the resulting cookies tasted. 

This approach also applies to developing Machine Learning (ML) models. Developing an ML model takes many experiments because small changes in the input - like the ingredients - can greatly impact the results - like the taste of the cookies. Thus, tracking your experiments is a good idea to avoid losing sight of what worked and what didn’t. 
This article discusses:

Table of Contents

What is Experiment Tracking in Machine Learning?

In the machine learning workflow, experiment tracking is the process of saving relevant metadata for each experiment and organizing the experiments. In this context, an ML experiment is a systematic approach to testing a hypothesis, and its relevant metadata contains the experiment’s inputs and outputs.
Examples would be:
  • Hypothesis – If I increase the number of epochs, the validation accuracy will increase.
  • Inputs – Code, datasets, or hyperparameters.
  • Outputs – Metrics and models.
The development of an ML model aims to find the best model regarding, e.g., metric, resource usage, or inference time, depending on your constraints. This iterative development process involves running many experiments, analyzing and comparing their results to other experiments, and trying new ideas to develop the best-performing configuration. 
For this purpose, we track the inputs, such as
  • code,
  • training and validation data (including e.g. different features and data augmentations)
  • and model architecture and model hyperparameters, 
and the outputs of an ML experiment, such as 
  • evaluation metrics
  • and model weights.

Why Do You Need to Track Your ML Experiments?

Because slight changes in the inputs can lead to entirely different results, you will run many experiments to develop the best model. Without logging the inputs and outputs and organizing the experiments, you can quickly lose sight of what worked and what didn’t.
Thus, tracking your ML experiments in an organized way can help you in the following aspects:
  • Overview: How many and what experiments were run?
  • Details and Reproducibility: What were the details of the experiments, and how can we reproduce the results?
  • Comparison: Which ideas and what changes led to improvements? 
With the information gained, you can focus on new approaches and improving the prototypes rather than trying to make sense of a large number of unorganized experiments.

How Do You Track Machine Learning Experiments?

You can track your machine learning experiments either manually or automatically with the help of different tools. You can manually track your experiments with pen and paper or digitally in text files or spreadsheets. Or you can automate the task by adding logging functions to your code or using modern experiment tracking tools.
For each approach, we will go through the three steps of how to set this approach up, log the inputs and outputs, and retrieve the information. 

Manual Experiment Tracking

Suppose you develop machine learning models on your own or in a small team and only run a manageable amount of experiments. In that case, you can easily track your experiments manually with pen and paper, text files, or spreadsheets. 
This approach is straightforward and a great way to track your experiments when you get started. 
However, it has quite a few downsides: First, manually logging all relevant metadata of an experiment requires discipline and time. Second, making mistakes during the manual logging process is inevitable. Third, if you lose your manually logged experiment notes (similar to not using version control and losing your code), you might have to rerun many – if not all – of your experiments. Last, aside from the fact that this tedious task is made to be automated, this approach does not scale well when you need to run a lot of experiments.
For example, let’s run through the rough workflow of manual experiment tracking with pen and paper.

1. Setup

Grab a pen and a notebook, and you’re all set and ready to go!


2. Logging Inputs and Outputs

In a nutshell, you write down what you are doing for each experiment. This can range from a simple table, where you write down the model’s hyperparameters and their resulting model’s performance, to a collection of entries for each experiment with more details, such as the metric at each epoch during training.


3. Information Retrieval

Depending on how neatly you took notes during the development phase, you might find information quickly – or not.
If you track your experiments manually in a digital artifact like a text file or spreadsheet, you can also search for specific terms or filter and sort columns. 


Automated Experiment Tracking without Experiment Tracking Tools

A popular way to track machine learning experiments is to automate the tedious work of writing down everything that might be important by including logging functionalities to your code.
While this approach takes a little more effort to set up than manual experiment tracking, this approach is easy to implement, straightforward, and saves you time in the long run because it is not as error-prone (e.g., making mistakes during manual logging, losing notes, etc.) as manual experiment tracking.
For example, let’s run through the rough workflow of automated experiment tracking by writing code to log information to a spreadsheet.


1. Setup

Set up a spreadsheet. There are various different ways you can add logging functionality to your code. In this example, we will read the spreadsheet to a pandas DataFrame and append a new row for each experiment.


2. Logging Inputs and Outputs

Next, log all relevant experiment metadata to a single dictionary. Then, you can append the experiment’s dictionary to the pandas DataFrame as a new line. In the end, you can save the pandas DataFrame back to a spreadsheet.
You could also automate saving relevant plots to a dedicated folder.

3. Information Retrieval

In your spreadsheet, you can now search, filter, and sort results from different experiments. 

Automated Experiment Tracking with Experiment Tracking Tools

Finally, there are modern experiment tracking tools, which are solutions built specifically for tracking, organizing, and comparing experiments. There are several popular options, such as
  • CometML,
  • MLFlow,
  • Neptune,
  • TensorBoard, and of course
  • Weights & Biases. 
For example, let’s run through the rough workflow of automated experiment tracking with Weights & Biases. You can find the related code in my Kaggle Notebook and experiments in my W&B Dashboard.

1. Setup

Import the wandb library in Python and initialize a new run with wandb.init() at the beginning of your code. 


2. Logging Inputs and Outputs 

To be able to distinguish inputs from outputs, they are logged in different ways. To log inputs, use the wandb.config object in your code to save your training configuration. To log outputs, call wandb.log(dict) to log a dictionary of metrics, media, or custom objects to a step.


3. Information Retrieval

You can get an overview in the W&B dashboard, which is a central place to organize your ML experiments wherever you train your models (local machine, lab cluster, spot instances in the cloud). In this dashboard, you can search, filter, sort, and group results from different experiments and even compare selected experiments with each other.
You also have the possibility to compare logged values like metrics over epochs in this dashboard.
Additionally, in the W&B Experiment dashboard, you can see the details and metadata of an experiment.


Whether you track your ML experiments with pen and paper or with an experiment tracking tool, tracking your experiments can save you a lot of time and headaches when you develop an ML model.
We have discussed that tracking your inputs (e.g., code, datasets, or hyperparameters) is important to reproduce the results and to get an idea of what worked and what didn’t. Saving the outputs (e.g., metrics and models) is equally important because you need to compare your results to find the best model according to your constraints, e.g., performance.
Also, we reviewed three different general approaches to ML experiment tracking: manual and automated - the latter with and without ML tracking tools.