Prediction Training Dashboard
Created on August 19|Last edited on September 9
Comment
Goals
This dashboard documents model training code, experiments and tracked metrics. We perform an analysis of the tracked metrics from multiple trials and present our key findings related to the prediction model.
For complete reproduce-ability, the code used to train the model is documented below.
Code
Analysis of Grouped Trials from Latest Run
Key insights
- Lower learning rates (~0.0006) yield lower avg loss (~10 MSE)
- Training on the semantic rasters usually yields lower avg loss when compared to satellite rasters
- The model loss stabilizes around roughly 5000 training steps.
- Higher learning rates (> 0.006) results in model avg loss (>50 MSE) not stabilizing in training.
Training Metrics
Run set
56
Key Insights
- The largest batch size (24) in our experiments results in roughly 80% GPU utilization. We can perhaps make increasing the batch size to get more out of the compute.
- Higher learning rates cause instability in training.
Tuned Hyperparameter Analysis
Run set
56
Key insights
- It's evident from the plot above that Ray optuna iteratively reduces the hyperparamter search space to quickly find the best set of hyperparameters.
- Lower learning rates ( <0.0009) with higher number of training steps yields lower average loss.
- Higher batch sizes in combination with lower learning rates yields lower average loss.
- There is very little correlation between the type of raster (semantic vs satellite) and the average loss.
Add a comment