Use Mixed Precision Training

Double your batch size with mixed precision.
Created on May 4|Last edited on May 4
Comment
I am developing models for the Plant Pathology 2021 - FGVC8 Kaggle competition and needed to train EfficientNet-B4 with image resolution of 512x512. Well I could only train with batch size of 8 and each epoch took roughly 12 minutes on a single V100 GPU. 
Kaggle requires quick model development and experimentation and tried out Mixed Precision Training. I followed this TensorFlow tutorial to quicky use this training strategy in my training pipeline. 
I am now able to train with batch size of 16 and each epoch takes roughly 6 minutes. 
Here's how you can use Mixed Precision TrainingWith a training notebook/script in place simply add these lines.
1. Importfrom tensorflow.keras import mixed_precision
2. Set the dtype policymixed_precision.set_global_policy('mixed_float16')
3. Slight modification to your modelIn your model you just have to modify the output layer. A typical classification head (output layer) is as shown below,
outputs = tf.keras.layers.Dense(num_labels, activation='softmax')(x)
You have to modify it such that the activation is a separate layer (it has no trainable params) and has a "float32" dtype. It is shown below,
outputs = tf.keras.layers.Dense(num_labels)(x)
outputs = layers.Activation('softmax', dtype='float32')(outputs)
4. Train with model.fitLoss scaling is required to train the model without numerical under or overflow. If you are training with model.fit() this loss scaling is automatically taken care of. 
Note that you will hardly lose any performance training with mixed precision.
To learn more check out this W&B blog post by Sayak Paul titled Mixed precision training with tf.keras. You can also read the paper here.
﻿
Add a comment