Using GPUs With Keras: A Tutorial With Code
This tutorial covers how to use GPUs for your deep learning models with Keras, from checking GPU availability right through to logging and monitoring usage.
Created on July 6|Last edited on March 3
Comment
An Introduction To Using Your GPU With KerasChecking Your GPU Availability With KerasUsing Your GPU For Model Training With KerasMonitoring Your GPU UsageInterpreting Your System MetricsMemory Growth For GPU Allocation With KerasSummaryWeights & BiasesRecommended Reading
An Introduction To Using Your GPU With Keras
This tutorial walks you through the Keras APIs that let you use and have more control over your GPU. We will show you how to check GPU availability, change the default memory allocation for GPUs, explore memory growth, and show you how you can use only a subset of GPU memory.
We'll use Weights and Biases to automatically log all our GPU and CPU utilization metrics, which makes it easy to monitor our compute resource usage as we train a plethora of models.
If you'd like top follow along, here's a helpful Colab to assist:
Try it on Colab Notebook
Checking Your GPU Availability With Keras
The easiest way to check if you have access to GPUs is to call tf.config.experimental.list_physical_devices('GPU').
This will return a list of names of your GPU devices.
>>> print('GPU name: ', tf.config.experimental.list_physical_devices('GPU'))GPU name: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Using Your GPU For Model Training With Keras
If a TensorFlow operation has both CPU and GPU implementations, by default the GPU will be used by default. So we don't need to change anything about our training pipeline to use a GPU.
Monitoring Your GPU Usage
If you are tracking your models using Weights & Biases, all your system metrics, including GPU utilization, will be automatically logged every 2 seconds. Some of the most important metrics logged are GPU memory allocated, GPU utilization, CPU utilization etc. You can see the full list of metrics logged here.
You can see a sample of these system metrics automatically logged by W&B while training a model below:
Run set
1
Interpreting Your System Metrics
- CPU Utilization: This metric shows CPU utilization during training. We can see that ~44% of the CPU is used, mostly while scaling the images to the [0-1] range. CPUs are fully utilized for operations like Data Augmentation.
- Disk I/O Utilization: This metric shows the disk utilization. Since our Cats Vs Dogs dataset is not loaded into the memory (given its size), the dataloader needs to fetch it from the disk. Therefore, we have a constant disk usage for the period of training.
- GPU Utilization: This is probably the most important metric as it tracks the percent of the time during which one or more operations were executing on the GPU. Ideally we want this to be 100%. In our case we have around 97% GPU usage.
- GPU Accessing Memory: This measures the percent of the time during which the GPU memory was being read or written. We would want this metrics to be as low as possible as we want our GPU to do operations on data instead of accessing memory. Our GPU access time is around 45%.
- GPU Memory Allocated: This is the amount of GPU memory allocated. By default TensorFlow allocates all of the available GPU memory. In our case around 73% is allocated.
- GPU Temperature: If you have your own GPU setup, this metric is really helpful.
Memory Growth For GPU Allocation With Keras
By default Keras allocates all the memory of a GPU. But at times, we need to have finer grained controls on the GPU memory. For these cases, we can turn on memory growth by calling tf.config.experimental.set_memory_growth.
This method allocates only the GPU memory actually needed for runtime allocations. It starts out by allocating a small amount of memory, then as the model trains and more GPU memory is needed, the GPU memory is extended.
I've also created a Colab to assist in this section of the tutorial:
Try it on Colab Notebook
# Ref: https://www.tensorflow.org/guide/gpugpus = tf.config.experimental.list_physical_devices('GPU')if gpus:try:# Currently, memory growth needs to be the same across GPUsfor gpu in gpus:tf.config.experimental.set_memory_growth(gpu, True)logical_gpus = tf.config.experimental.list_logical_devices('GPU')print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")except RuntimeError as e:# Memory growth must be set before GPUs have been initializedprint(e)
Let's observe the effect memory implementing growth has. We can see that a small amount of GPU memory was allocated at the start of the runtime and grew substantially during training as the needs to of the model training process grew.
Run set
2
Summary
In this article, you saw how you can leverage GPUs for your deep learning research using Keras, and use Weights and Biases to monitor your resource consumption. Checkout this great article by Lambda Labs on tracking system resource utilization during training with the Weights & Biases.
Weights & Biases
Weights & Biases helps you keep track of your machine learning experiments. Use our tool to log hyperparameters and output metrics from your runs, then visualize and compare results and quickly share findings with your colleagues.
Recommended Reading
Setting Up TensorFlow And PyTorch Using GPU On Docker
A short tutorial on setting up TensorFlow and PyTorch deep learning models on GPUs using Docker.
How to Compare Keras Optimizers in Tensorflow for Deep Learning
A short tutorial outlining how to compare Keras optimizers for your deep learning pipelines in Tensorflow, with a Colab to help you follow along.
LSTM RNN in Keras: Examples of One-to-Many, Many-to-One & Many-to-Many
In this report, I explain long short-term memory (LSTM) recurrent neural networks (RNN) and how to build them with Keras. Covering One-to-Many, Many-to-One & Many-to-Many.
Optimizing Models with Post-Training Quantization in Keras - Part I
Performing Facial Keypoints Detection with Post-Training Quantization in Keras
Add a comment
batman><">
Reply
Iterate on AI agents and models faster. Try Weights & Biases today.