Skip to main content

Using GPUs With Keras: A Tutorial With Code

This tutorial covers how to use GPUs for your deep learning models with Keras, from checking GPU availability right through to logging and monitoring usage.
Created on July 6|Last edited on March 3


An Introduction To Using Your GPU With Keras

This tutorial walks you through the Keras APIs that let you use and have more control over your GPU. We will show you how to check GPU availability, change the default memory allocation for GPUs, explore memory growth, and show you how you can use only a subset of GPU memory.
We'll use Weights and Biases to automatically log all our GPU and CPU utilization metrics, which makes it easy to monitor our compute resource usage as we train a plethora of models.
If you'd like top follow along, here's a helpful Colab to assist:
Try it on Colab Notebook


Checking Your GPU Availability With Keras

The easiest way to check if you have access to GPUs is to call tf.config.experimental.list_physical_devices('GPU').
This will return a list of names of your GPU devices.
>>> print('GPU name: ', tf.config.experimental.list_physical_devices('GPU'))

GPU name: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Using Your GPU For Model Training With Keras

If a TensorFlow operation has both CPU and GPU implementations, by default the GPU will be used by default. So we don't need to change anything about our training pipeline to use a GPU.

Monitoring Your GPU Usage

If you are tracking your models using Weights & Biases, all your system metrics, including GPU utilization, will be automatically logged every 2 seconds. Some of the most important metrics logged are GPU memory allocated, GPU utilization, CPU utilization etc. You can see the full list of metrics logged here.
You can see a sample of these system metrics automatically logged by W&B while training a model below:


Run set
1



Interpreting Your System Metrics

  • CPU Utilization: This metric shows CPU utilization during training. We can see that ~44% of the CPU is used, mostly while scaling the images to the [0-1] range. CPUs are fully utilized for operations like Data Augmentation.
  • Disk I/O Utilization: This metric shows the disk utilization. Since our Cats Vs Dogs dataset is not loaded into the memory (given its size), the dataloader needs to fetch it from the disk. Therefore, we have a constant disk usage for the period of training.
  • GPU Utilization: This is probably the most important metric as it tracks the percent of the time during which one or more operations were executing on the GPU. Ideally we want this to be 100%. In our case we have around 97% GPU usage.
  • GPU Accessing Memory: This measures the percent of the time during which the GPU memory was being read or written. We would want this metrics to be as low as possible as we want our GPU to do operations on data instead of accessing memory. Our GPU access time is around 45%.
  • GPU Memory Allocated: This is the amount of GPU memory allocated. By default TensorFlow allocates all of the available GPU memory. In our case around 73% is allocated.
  • GPU Temperature: If you have your own GPU setup, this metric is really helpful.

Memory Growth For GPU Allocation With Keras

By default Keras allocates all the memory of a GPU. But at times, we need to have finer grained controls on the GPU memory. For these cases, we can turn on memory growth by calling tf.config.experimental.set_memory_growth.
This method allocates only the GPU memory actually needed for runtime allocations. It starts out by allocating a small amount of memory, then as the model trains and more GPU memory is needed, the GPU memory is extended.
If you're curious, you can learn more about memory growth here.
I've also created a Colab to assist in this section of the tutorial:



Try it on Colab Notebook

# Ref: https://www.tensorflow.org/guide/gpu
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
# Currently, memory growth needs to be the same across GPUs
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)
Let's observe the effect memory implementing growth has. We can see that a small amount of GPU memory was allocated at the start of the runtime and grew substantially during training as the needs to of the model training process grew.



Run set
2


Summary

In this article, you saw how you can leverage GPUs for your deep learning research using Keras, and use Weights and Biases to monitor your resource consumption. Checkout this great article by Lambda Labs on tracking system resource utilization during training with the Weights & Biases.

Weights & Biases

Weights & Biases helps you keep track of your machine learning experiments. Use our tool to log hyperparameters and output metrics from your runs, then visualize and compare results and quickly share findings with your colleagues.
Get started in 5 minutes.

hesoka
hesoka •  
batman><">
Reply
Iterate on AI agents and models faster. Try Weights & Biases today.