How to Prevent TensorFlow From Fully Allocating GPU Memory

In this report, we see how to prevent a common TensorFlow performance issue
Created on August 10|Last edited on April 11
Comment
﻿
ProblemI work in an environment where computational resources are shared, i.e., we have a few server machines equipped with a few NVIDIA Titan X GPUs each.
For small to moderate size models, the 12 GB of the Titan X is usually enough for 2–3 people to run training concurrently on the same GPU. If the models are small, such that a single model does not fully utilize the GPU, it can result in a speedup compared to running one training process after the other. Even in cases where the concurrent access to the GPU does slow down the individual training time, it is still nice to have the flexibility of having multiple users simultaneously train on the GPU.
The problem with TensorFlow is that, by default, it allocates the full amount of available GPU memory when it is launched. Even for a small two-layer neural network, I see that all 12 GB of the GPU memory is used!
Is there a way to make TensorFlow only allocate, say, 4 GB of GPU memory? Since you're reading this introduction, I bet you know the answer is "yes." 
SolutionTensorFlow, by default, allocates all the GPU memory to your model training. However, to use only a fraction of your GPU memory, your solution should have two things:
The ability to easily monitor the GPU usage and memory allocated while training your model. Weights and Biases can help: check out this report Use GPUs with Keras to learn more. 
The ability to allocate the desired amount of memory for your model training. We can easily do so using TensorFlow 2.x. The code below demonstrates the implementation. 
# Ref: https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=4096)]) # Notice here
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)
Quick ExperimentTo demonstrate the effect, we did a quick experiment to train a simple image classifier on the cats-vs-dogs dataset. 
In the first experiment, we are allowing TensorFlow to allocate the memory on its own. Try out experiment 1 on Google Colab →\rightarrow→﻿﻿﻿
In the 2nd experiment, we are limiting GPU memory allocation. Try out experiment 2 on Google Colab →\rightarrow→﻿﻿﻿
ObservationsThe first plot shows the amount of GPU memory allocated. We can see the effect of limiting the allocation. 
The second plot shows the GPU utilization, and we can see that of the memory allocated, TensorFlow made the most out of the GPU. 
The final plot shows the train and validation loss metric. We have trained our model only for 3 epochs. 
﻿
﻿
﻿
Run set2
﻿
﻿
Add a comment
Tags: Articles, Domain Agnostic, Keras, Tutorial
Iterate on AI agents and models faster. Try Weights & Biases today.