Introduction

There are different levels of stochasticity in machine learning. Sometimes they're in the process of sampling the dataset, and other times in the machine learning models (specifically neural networks) themselves. While stochasticity brings a number of advantages in model training, it also introduces some gnarly problems with reproducibility.

GitHub repository →

In this report, we'll go over some of the methods that promise to make our machine learning experiments more reproducible. Before we jump to the nitty-gritty of that, we would discuss some motivation behind ensuring our machine learning experimentation is reproducible.

Let's get started!

(Image comes from here)

Why do we care about reproducibility?

To start this section, I will borrow something from Joel Grus's talk Reproducibility as a Vehicle for Engineering Best Practices -

Joel presented a number of very important points as to why reproducibility in ML is necessary. Here are some of them -

Just to top it all (from Joel's afore-mentioned talk) -

[...] software engineering best practices will make you a better researcher.

Honestly, although I knew about reproducibility, it was only after I went through Joel's deck that I could truly understand the urgent need for reproducibility.

This report focuses on developing reproducible models, which in turn takes care of most of the issues that arise from non-reproducibility.

Developing reproducible models

It's almost impossible to cater to all the ML models and frameworks out there and talk about reproducibility in one single report. So, we are just going to focus on one pair - Neural Networks and TensorFlow. Note that most of these concepts would still be applicable to the other frameworks.

Before we write any code, we need to make sure our hardware/software infrastructure is unified. This is especially useful when you are working on a team.

Overview of the methods covered

Developing reproducible models

Conclusion

This report was an effort to provide some simple but useful methods that can help you to build reproducible models. This is in no way an exhaustive list. I polled a number of Machine Learning GDEs about their thoughts on reproducibility and here's what they said:

Thanks to Mat, Aakash, and Souradip for their contributions. As ML practitioners, maximum reproducibility should always be our goal, in addition to SOTA results.

I would love to know what reproducibility tools/methods you use. If you have any feedback on the report don't hesitate to tweet me at @RisingSayak.