Automate Hyperparameter Tuning Using Keras-Tuner and W&B
In this article, we take a look at how to integrate Weights & Biases with Keras-Tuner so that we can automate hyperparameter tuning — and save time.
Created on January 28|Last edited on December 5
Comment
An artificial neural network is made up of many prior constraints, weights, and biases. These constraints, i.e., the number of neurons, the choice of activation (non-linearity), and the number of layers are commonly termed 'hyperparameters'.
A vast field of research is based on hyperparameter optimization. This means people are interested in not only turning the knobs of the weights and biases but also that of the hyper-parameters. There are some great approaches (Grid, Random, Bayesian, to name some), which have already marked this field.
A large amount of time for Deep Learning experimentation is spent on choosing good hyperparameters. The choice of good hyperparameters can sometimes be game-changers for the experiment. This topic is widely studied and researched. With the advent of various search algorithms, we can tune the hyperparameters automatically. The concept of tuning hyperparameters by searching a hyperparameter space automatically has helped reduce the time of DL researchers who were doing it manually.
In this article, we will be looking into one such tool that helps in the automation of hyper-parameter tuning, the keras-tuner. We will not only understand the basics of the tool but also try integrating it with our favourite experiment tracker wandb.
Check out the Kaggle Notebook
Table of Contents
The API of keras-tuner
The Keras team always puts a lot of effort into the API design of their tools. This tool does not stray away from a similar thought process.
There are four basic interfaces that the API provides. These interfaces are the heart of the API.
- HyperParameters: This class serves as a hyperparameter container. An instance of this class contains information about the present hyperparameters and the search space in total.
- Hypermodel: An instance of this class can be thought of as an object that models the entire hyperparameter space. The instance not only builds the hyperparameter space but also builds DL models sampling from the hyperparameters.
- Oracles: Each instance of this class implements a particular hyperparameter tuning algorithm.
- Tuners: A Tuner instance does the hyperparameter tuning. An Oracle is passed as an argument to a Tuner. The Oracle tells the Tuner which hyperparameters should be tried next.
The top-down approach to the API design makes it readable and easy to understand. To iterate it all:
- Build HyperParameters objects;
- Pass the HyperParameters to the Hypermodel that can then build the search space;
- Build Oracles, which provides the tuning algorithms;
- Build Tuners that tune the hyperparameters according to the Oracles.
Code with keras-tuner
In this section, I will try to explain the basic usage of keras-tuner with an example. The example is taken from their own documentation.
Leaving aside the imports that are necessary to run the tuner, we need first to build the Hypermodel that will emulate the entire search space.
We can build a Hypermodel in two ways:
- Build models with a function
- Subclass from the Hypermodel class
Function
Here we build a function that takes HyperParameters as an argument. The function samples from the HyperParameters and builds models and returns them. This way different models are made from the search space.
# build with functiondef build_model(hp):model = keras.Sequential()model.add(layers.Dense(units=hp.Int('units',min_value=32,max_value=512,step=32),activation='relu'))model.add(layers.Dense(10, activation='softmax'))model.compile(optimizer=keras.optimizers.Adam(hp.Choice('learning_rate',values=[1e-2, 1e-3, 1e-4])),loss='sparse_categorical_crossentropy',metrics=['accuracy'])return model
Subclassing the Hypermodel class
With this method, one needs to override the build() method. In the build() method the user can sample from the HyperParameters and build suitable models.
# build with inheritanceclass MyHyperModel(HyperModel):def __init__(self, num_classes):self.num_classes = num_classesdef build(self, hp):model = keras.Sequential()model.add(layers.Dense(units=hp.Int('units',min_value=32,max_value=512,step=32),activation='relu'))model.add(layers.Dense(self.num_classes, activation='softmax'))model.compile(optimizer=keras.optimizers.Adam(hp.Choice('learning_rate',values=[1e-2, 1e-3, 1e-4])),loss='sparse_categorical_crossentropy',metrics=['accuracy'])return model
In both cases, a Hypermodel is created by providing HyperParameters. An interested reader is advised to look into the way the hyperparameters are sampled. The package provides not only static choices but also provides conditional hyperparameters.
After we have our Hypermodel ready, it is time to build the Tuner. Tuner searches the hyperparameter space and gives us the most optimised set of hyperparameters. Below I have written the tuners for both the Hypermodel setting.
# tuner for functiontuner = RandomSearch(build_model,objective='val_accuracy',max_trials=5,executions_per_trial=3,directory='my_dir',project_name='helloworld')# tuner for subclasshypermodel = MyHyperModel(num_classes=10)tuner = RandomSearch(hypermodel,objective='val_accuracy',max_trials=10,directory='my_dir',project_name='helloworld')
Note: With the custom Tuner one needs to pass the tuner an Oracle that helps the tuner with the searching algorithm.
With everything set, we are good to run the search. The search method follows the same design as the fit method does. After search we can query the tuner for the best model and also the hyperparameters.
tuner.search(x, y,epochs=5,validation_data=(val_x, val_y))
Code to Integrate keras-tuner with wandb
Check out the Kaggle Notebook
How cool would it be to track all the models in one place along with keras-tuner? Here we would integrate wandb with our keras-tuner to help track all the models that are created and searched through. This will not only help with retrieving the best model but also will provide some insights that are of high value.
Hypermodel
Here we take the functional way to build the Hypermodel. This serves as an extremely easy way to build models.
In this example, one can see that the usage of conditional hyperparameters is implemented. We have a for loop creating a tunable number of conv_layers, which themselves involve a tunable filters and kernel_size parameter.
def build_model(hp):"""Builds a convolutional model.Args:hp: Hyperparamet object, This is the object that helpsus sample hyperparameter for a particular trial.Returns:model: Keras model, Returns a keras model."""inputs = tf.keras.Input(shape=(28, 28, 1))x = inputs# In this example we also get to look at# conditional heyperparameter settings.# Here the `kernel_size` is conditioned# with the for loop counter.for i in range(hp.Int('conv_layers', 1, 3)):x = tf.keras.layers.Conv2D(filters=hp.Int('filters_' + str(i), 4, 32, step=4, default=8),kernel_size=hp.Int('kernel_size_' + str(i), 3, 5),activation='relu',padding='same')(x)# choosing between max pool and avg poolif hp.Choice('pooling' + str(i), ['max', 'avg']) == 'max':x = tf.keras.layers.MaxPooling2D()(x)else:x = tf.keras.layers.AveragePooling2D()(x)x = tf.keras.layers.BatchNormalization()(x)x = tf.keras.layers.ReLU()(x)if hp.Choice('global_pooling', ['max', 'avg']) == 'max':x = tf.keras.layers.GlobalMaxPooling2D()(x)else:x = tf.keras.layers.GlobalAveragePooling2D()(x)outputs = tf.keras.layers.Dense(10, activation='softmax')(x)model = tf.keras.Model(inputs, outputs)return model
Tuner
Integrating the tuner to log the config and loss with wandb was a piece of cake. The API provides the user to override the run_trial method of the kt.Tuner class. In the run_trial method, one can harness the HyperParameters object. This is used to query the present hyperparameters as config of a wandb run. Not only does this mean that now we can log the metrics of the models, but we can also compare the hyperparameters with the help of great widgets that wandb provides in their dashboard.
class MyTuner(kt.Tuner):"""Custom Tuner subclassed from `kt.Tuner`"""def run_trial(self, trial, train_ds):"""The overridden `run_trial` functionArgs:trial: The trial object that holds information for thecurrent trial.train_ds: The training data."""hp = trial.hyperparameters# Batching the datatrain_ds = train_ds.batch(hp.Int('batch_size', 32, 128, step=32, default=64))# The models that are createdmodel = self.hypermodel.build(trial.hyperparameters)# Learning rate for the optimizerlr = hp.Float('learning_rate', 1e-4, 1e-2, sampling='log', default=1e-3)if hp.Choice('optimizer', ['adam', 'sgd']) == 'adam':optimizer = tf.keras.optimizers.Adam(lr)else:optimizer = tf.keras.optimizers.SGD(lr)epoch_loss_metric = tf.keras.metrics.Mean()# build the train_step@tf.functiondef run_train_step(data):"""The run stepArgs:data: the data that needs to be fitReturns:loss: Returns the loss for the present batch"""images = tf.dtypes.cast(data['image'], 'float32') / 255.labels = data['label']with tf.GradientTape() as tape:logits = model(images)loss = tf.keras.losses.sparse_categorical_crossentropy(labels, logits)gradients = tape.gradient(loss, model.trainable_variables)optimizer.apply_gradients(zip(gradients, model.trainable_variables))epoch_loss_metric.update_state(loss)return loss# WANDB INITIALIZATION# Here we pass the configuration so that# the runs are tagged with the hyperparams# This also directly means that we can# use the different comparison UI widgets in the# wandb dashboard off the shelf.run = wandb.init(entity='ariG23498', project='keras-tuner', config=hp.values)for epoch in range(10):self.on_epoch_begin(trial, model, epoch, logs={})for batch, data in enumerate(train_ds):self.on_batch_begin(trial, model, batch, logs={})batch_loss = run_train_step(data)self.on_batch_end(trial, model, batch, logs={'loss': batch_loss})if batch % 100 == 0:loss = epoch_loss_metric.result().numpy()# Log the batch loss for WANDBrun.log({f'e{epoch}_batch_loss':loss})# Epoch loss logicepoch_loss = epoch_loss_metric.result().numpy()# Log the epoch loss for WANDBrun.log({'epoch_loss':epoch_loss, 'epoch':epoch})# `on_epoch_end` has to be called so that# we can send the logs to the `oracle` which handles the# tuning.self.on_epoch_end(trial, model, epoch, logs={'loss': epoch_loss})epoch_loss_metric.reset_states()# Finish the wandb runrun.finish()
Run set
10
Conclusion
I would advise my readers to quickly spin up a notebook and try a great tool for themselves. For future references, one can go and read the great docs for keras-tuner.
The topic of hyperparameter tuning is so vastly researched that people have also tried incorporating genetic algorithms and used the concept of evolving models similar to us creatures. A shameless plug here would be to link the interested reader to one of my articles that deconstructs the concept of hyperparameter tuning with Genetic Algorithm.
Add a comment
Hey! great report. How would you compare keras-tuner to other hyperparameter tuning libraries like ray/tune and optuna
Reply
Iterate on AI agents and models faster. Try Weights & Biases today.