Skip to main content

Hyperparameter Tuning for a simple CNN 1D Text Classifier

In this post, we will explore W&B tools to perform and track a hyperparameter tuning job
Created on September 24|Last edited on September 24


Task Description

Once we have defined and evaluated an initial baseline model to solve our consumer complaints classifier, we want to search for a better model, so our next step will be to conduct a hyperparameter tuning job: searching across the range of values of every hyperparameter of the model in order to reach a higher accuracy.
It is relevant to notice that our model is very simple, just one layer with a Conv1D and a dropout component followed by a single Dense layer. Therefore everything seems to indicate that there is little room for improvement, even so we will try to achieve at least a slight gain in results.

Sweep Analysis

For this exercise, we define the following search space for our hyperparameters:
lr:
distribution: log_uniform_values
min: 1e-5
max: 1e-2
weight_decay:
distribution: uniform
min: 0.7
max: 0.9
batch_size:
values: [64,128,256]
dropout:
distribution: uniform
min: 0.3
max: 0.5
kernel_size:
values: [32,64]
filter_size:
values: [5,7]
We limit the sweep job to a maximum of 20 trials and here is the result:


Our best model is slightly better than our baseline model as we suspect, the accuracy in all trials is very similar.

Best model results



Run: vivid-sweep-13
1

The validation accuracy in the baseline model is 0.851 while in the optimized model it reaches the value of 0.853.
Let's check the classification report to confirm all classes are equally classified.

category
precision
recall
f1-score
support
1
2
3
4
5
6
7
8



Predicted values for the best model

We collect the predicted values during our tuning job, we can explore them and we might detect some relevant conclusions.


prediction
1
0
2
1
3
2
4
3
5
4
product
File<(table)>
File<(table)>