ML Experiment tracking and model management using WandB
As part of MLOps Assignment 3, I developed a simple NN model for handwritten digit recognition over the MNIST dataset.
One of the hyperparameters for the model was the learning rate (lr). I swept this over values (.01, .001, .0001) in an attempt to identify the optimal value.
I saved the best model as a versioned artifact.
This is a report that summarizes this exercise.
Created on February 10|Last edited on February 11
Comment
Training the NN model
The model is architected as a simple Neural Network, with two Fully Connected Layers in the configuration: Input -> FC1 -> ReLU -> FC2 -> Output. The model was trained over 5 epochs.
Hyperparameter Exploration
As advised, WandB Sweep capability was used to sweep learning rate (lr) parameter through values .01, .001, and .0001.
The value of lr = .001 demonstrated the best performance in terms of training and test accuracy.
Best Model - Artifact
Observations
- Holding number of batches and epochs static (64 and 5 respectively), a learning rate of 0.001 provided the best model performance. The validation accuracy achieved was 96.42%.
- A larger learning rate, i.e. 0.01 - led to a marginally poor validation accuracy (93.65%). Validation loss was plateaued around 0.23.
- A smaller learning rate, i.e. 0.001 - led to a poorer validation accuracy (92.49%). Validation loss was higher than the best run, but the trend indicated that it would decrease if the number of epochs were increased.
- As evident from the following trend charts that compare a run with (lr 0.01, epochs 5) against another with (lr 0.001, epochs 10), a smaller learning rate and appropriately larger number of epochs yield stabler training and better performance, however, it comes at a cost of training time (CPU). If the model is coded efficiently, the impact on memory is negligible.
Artifact Management Motivations
As part of the MLOps coursework and assignments, I developed useful insights on why Artifact Management is critically important to ensure reproducibility and version control in ML development and deployment environments.
Reproducibility
In Machine Learning (ML) research and applications, it is fundamentally important to be able to run a given experiment with specified inputs, model code and environment - and obtain the same results. This reproducibility ensures that ML models and research findings can be consistently verified. This is even more important for a rapidly evolving field like ML/AI, where new innovations are rapidly built on top of past, trustable work.
Using Artifact Management, key components of a researched model - i.e. datasets, model code and tuned weights, hyperparameters, and dependencies - can be tagged and tracked. This enables seamless collaboration between participating development, testing and integrating teams - since everyone has a well defined source to work with. This also eases deployment in newer environments.
Version Control
ML workflows typically involve multiple artifacts—datasets, foundational models layered with application-specific code, hyperparameters, and dependencies. Keeping track of the versions of these artifacts - i.e. version control, ensures reproducibility, eases collaboration, accelerates debugging, and helps audit the models for requisite compliance.
ML teams typically work in parallel across different functions (say, data processing, model training, and deployment integration), and often different geographies. Version control ensures that they can communicate and collaborate effectively.
Also, as ML models evolve and get refined, version control allows teams to track the changes between model versions, and develop insights on factors which helped a model get better or get degraded.
In production environments, models are exposed to unseen data and sometimes unexpected use patterns. If a newly deployed model version runs into issues in production, version control enables an efficient mechanism to rollback to a previous stable version.
Add a comment