Skip to main content

Impact of Learning Rate on Model Performance

In this experiment, I explore the effect of varying learning rates on the performance of a simple fully connected neural network (FCN) model trained on [Fashion-MNIST dataset]. I tested three different learning rates: 0.001, 0.0005, and 0.0001, to understand how each rate affects training and validation performance
Created on February 13|Last edited on February 19
Effective hyperparameter settings found during sweeps:
Learning Rate = 0.001:
Epoch: 5
Training Accuracy: 88.45%
Test Accuracy: 87.00%
Training Loss: 0.32107
Observations:
The learning rate of 0.001 yielded the highest test accuracy (87%) compared to other configurations.
The model trained with a decent speed, converging well by the 5th epoch without any signs of overfitting or underfitting.
This learning rate strikes a good balance between training time and performance.
Learning Rate = 0.0005:
Epoch: 5
Training Accuracy: 88.12%
Test Accuracy: 86.58%
Training Loss: 0.31332
Observations:
The learning rate of 0.0005 performed slightly worse than 0.001, with a slight decrease in both test accuracy and training accuracy.
Despite this, it still provided stable training and exhibited a slightly lower training loss than the 0.001 learning rate.
The model's performance appears to level off, suggesting that a learning rate of 0.0005 could still be effective, although it may lead to slower convergence.
Learning Rate = 0.0001:
Epoch: 5
Training Accuracy: 85.38%
Test Accuracy: 84.07%
Training Loss: 0.4112
Observations:
The learning rate of 0.0001 led to the lowest test and training accuracy.
Training with this rate took longer to converge, and it also showed the highest training loss.
While this configuration ensured stability, it may have led to slower convergence, possibly resulting in the model not reaching optimal performance in just 5 epochs.
Comparison and Trade-offs:
Higher learning rates (0.001) yielded better test accuracy (87%), but the training process might have been slightly faster, potentially skipping some fine-tuning.
Lower learning rates (0.0005 and 0.0001) seemed more stable and slower to converge, with 0.0005 showing a good trade-off between speed and stability, and 0.0001 providing the most stable but slowest training.
Based on these observations, learning rate 0.001 seemed to provide the best results in terms of test accuracy, while learning rate 0.0005 showed stability with a slight trade-off in performance.
Conclusion:
Learning Rate 0.001 is the most effective for this model, offering a balance of speed and accuracy.
Learning Rate 0.0005 provides a safer training trajectory but with marginally lower accuracy.
Learning Rate 0.0001 is too slow, leading to poor performance after 5 epochs, but might need more epochs for convergence.


Run set
4