Transfer Learning With the EfficientNet Family of Models

In this article, we'll learn to use the EfficientNet family of models for transfer learning in TensorFlow using TFHub.
Created on April 26|Last edited on October 11
Comment
﻿
In this article, we'll look at how to make use of the EfficientNet family of models for transfer learning for image classification tasks. We'll be using the EfficientNet models ranging from b0 to b3. For comparison purposes, we will be using the MobileNetV2 model.
This article isn't going to talk about the nitty-gritty of the EfficientNet family of models. If you re interested in learning about the details of those models, you should absolutely check out this amazing report. 
This report is accompanied by a Colab Notebook so that you are able to reproduce the results. 
﻿Run the experiments on Google Colab﻿
Table of ContentsExperimental ConfigurationTensorFlow HubDatasetUtility Function for Utilizing TF Hub Models for Transfer LearningEfficientNet B0 + Custom Classification TopEfficientNet [B1, B2, B3] + Custom Classification TopComparison With MobileNetV2A Broader View of the Model Training TimesConcluding Remarks
﻿
﻿
﻿
Run set4
﻿
Experimental Configuration
TensorFlow HubAll the models we will be using for the experiments come from TensorFlow Hub. TensorFlow Hub provides a comprehensive collection of pre-trained models that can be used for transfer learning and many of those models even support fine-tuning as well. TensorFlow Hub has models for a number of different domains including image, text, video, and audio. Models are also available in different TensorFlow product formats including TensorFlow Lite, TensorFlow JS, and so on. 
DatasetWe will be using the Cats. vs. Dogs dataset. It is already included in TensorFlow Datasets. So, much of the hard work is already done for us. The below code listing downloads (if not already cached) and load the dataset that is already split into train and test sets as per our choice.
(raw_train, raw_validation), metadata = tfds.load(
    'cats_vs_dogs',
    split=['train[:80%]', 'train[80%:]'],
    with_info=True,
    as_supervised=True
)
Utility Function for Utilizing TF Hub Models for Transfer LearningMost of the image classification-based TF Hub models come in the following two variants:
One for running off-the-shelf image classification. 
One without any classification top. This is used for feature extraction or as they say for precomputing bottlenecks. 
All of these models are pre-trained on the ImageNet dataset. As we will be using transfer learning, we will be going with the second variant of models. One very important thing to note here is not all of these models can be fine-tuned especially the ones based on TensorFlow 1. 
Unfortunately, the EfficientNet family of models is not eligible for fine-tuning for this experimental configuration.  The below code-listing provides a utility function that downloads the respective feature extraction model, adds a classification top, compiles the final model, and finally returns it. 
def get_training_model(url, trainable=False):
    # Load the respective EfficientNet model but exclude the classification layers
    extractor = hub.KerasLayer(url, input_shape=(IMG_SIZE, IMG_SIZE, 3), trainable=trainable)
    
    # Construct the head of the model that will be placed on top of the
    # the base model
    model = tf.keras.models.Sequential([
        extractor,
        tf.keras.layers.Dense(128, activation="relu"),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(1)
    ])
    
    # Compile and return the model
    model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), 
                          optimizer="adam",
                          metrics=["accuracy"])
    
    return model
Note the url argument. For feature extractor networks based on EfficientNets, this generally looks like - https://tfhub.dev/google/efficientnet/<variant>/feature-vector/1.  Note that <variant> can be anything from b0 to b7. Although the utility function has a trainable argument, for EfficientNet models in TF Hub, if you specify trainable=True you would get the following - 
ValueError: in user code:
﻿
    /usr/local/lib/python3.6/dist-packages/tensorflow_hub/keras_layer.py:206 call  *
        self._check_trainability()
    /usr/local/lib/python3.6/dist-packages/tensorflow_hub/keras_layer.py:265 _check_trainability  *
        raise ValueError(
﻿
    ValueError: Setting hub.KerasLayer.trainable = True is unsupported when loading from the hub.Module format of TensorFlow 1.
In the next few sections,  we will be performing transfer learning with 4 different variants (b0 to b3) of the EfficientNet family of models and we will also be analyzing the performances of those different models. 
EfficientNet B0 + Custom Classification Top﻿
Run set1
﻿
As we can see the network does not too unstable training behavior. Following denotes the memory footprint of this model - 
ls -lh b0.h5
-rw-r--r-- 1 root root 18M Apr 11 14:19 b0.h5
To maintain brevity, let's visualize the training behavior of the rest of the three models based on the b1, b2, and b3 models respectively. 
EfficientNet [B1, B2, B3] + Custom Classification Top﻿
Run set4
﻿
You might have already noticed that as we increased the model capacity (`b0` means the least heavy model and b3 means the heaviest model in our case) the performance kept on degrading. Quoting Ajay's aforementioned article:﻿
Ok, so you probably have a fairly good idea of the computational cost of different EfficientNets by now.
But we still haven't addressed the most disturbing question: why didn't compound scaling work?
Specifically, we should have seen at least consistent performance across models, even if there wasn't an accuracy increase. So why did the bigger models perform worse? Here are a few possible reasons:
Hyperparameters: it's well known that the same hyperparameters don't work for all models, otherwise we'd all just use the globally "optimal" learning rate, batch size, etc. It could be the case that the larger models require higher/lower learning rates to perform well.
Overparameterization: The largest EfficientNet we used, EfficientNetb7, has over 60 million parameters. That a lot of a small dataset like ImageNette, and it's likely that the larger models had many more parameters than necessary.
Regularization: would have probably helped control the overparameterization issue. But adding regularization only to the large models would lead to unfair comparisons.
I really couldn't have explained the phenomenon better than what Ajay already did in the report. Now to measure how well are these models, in the next sections, we will be comparing them with MobileNetV2. 
Comparison With MobileNetV2Everything remains the same for this case except we can now make use of fine-tuning as well. More comparison's sake, we will only be using transfer learning in this case. 
As we can see that in terms of the losses our MobileNetV2-based model is doing way better than the b0-based model.  Note that we did not fine-tune in this case. We compared only with the b0-based model because it was the best performing one in our previous experiments. 
In terms of memory footprint as well, this MobileNetV2-based network wins:
ls -lh mobilenet_v2_no_ft.h5
-rw-r--r-- 1 root root 11M Apr 11 16:09 mobilenet_v2_no_ft.h5
﻿
﻿
Run set2
﻿
A Broader View of the Model Training TimesAs we can see the MobileNetV2-based model clearly outperforms all the variants of the EfficientNet-based models we tried so far. It's not only better performing but also it's better in terms of memory footprint and training time. The memory footprint can further be reduced with the help of quantization. 
﻿
﻿
Run set5
﻿
Concluding RemarksSo for our dataset, the EfficientNet family of models did not perform quite well but that does not anyway demean their significance.
If you have a relatively large dataset, you should definitely give those models a try. But at the same time, we should keep in mind we don't need a hammer to kill a rat. 
Let me know your thoughts on this report via Twitter (@RisingSayak). 
﻿
﻿
Add a comment
Tags: Intermediate, Computer Vision, Classification, Transfer Learning, Keras, Experiment, EfficientNet, Plots, Cat v Dog
Iterate on AI agents and models faster. Try Weights & Biases today.