Hyperparameter Tuning for a simple CNN 1D Text Classifier
In this post, we will explore W&B tools to perform and track a hyperparameter tuning job
Created on September 24|Last edited on September 24
Comment
Task Description
Once we have defined and evaluated an initial baseline model to solve our consumer complaints classifier, we want to search for a better model, so our next step will be to conduct a hyperparameter tuning job: searching across the range of values of every hyperparameter of the model in order to reach a higher accuracy.
It is relevant to notice that our model is very simple, just one layer with a Conv1D and a dropout component followed by a single Dense layer. Therefore everything seems to indicate that there is little room for improvement, even so we will try to achieve at least a slight gain in results.
Sweep Analysis
For this exercise, we define the following search space for our hyperparameters:
lr:distribution: log_uniform_valuesmin: 1e-5max: 1e-2weight_decay:distribution: uniformmin: 0.7max: 0.9batch_size:values: [64,128,256]dropout:distribution: uniformmin: 0.3max: 0.5kernel_size:values: [32,64]filter_size:values: [5,7]
We limit the sweep job to a maximum of 20 trials and here is the result:
Our best model is slightly better than our baseline model as we suspect, the accuracy in all trials is very similar.
Best model results
Run: vivid-sweep-13
1
The validation accuracy in the baseline model is 0.851 while in the optimized model it reaches the value of 0.853.
Let's check the classification report to confirm all classes are equally classified.
Predicted values for the best model
We collect the predicted values during our tuning job, we can explore them and we might detect some relevant conclusions.
Add a comment