TensorFlow is one of the most popular deep learning frameworks. The GPU support provided by TensorFlow enables neural network training in relatively less time.
Google Colab and Kaggle Kernels are two popular platforms that many machine learning practitioners turn to as they offer off-the-shelf GPU support besides readily available Python packages. Thus one can focus more on their machine learning workflow.
However, they are limited by the number of GPU hours and temporary/limited file storage. This can be a bottleneck in your ML workflow. Debugging your machine learning pipeline can be hard on these platforms as well.
Using your own GPU enabled machine can be really helpful especially for developing your pipeline. However, setting up your local environment to leverage the power of GPU is an involved process. One might not want to go down that path because of the complexities involved.
This report is written with the intent to make this process a bit less involved so that everyone can leverage the power of a local GPU. If you have a GPU enabled machine and use TensorFlow then this report is meant for you.
For clarity, I have these system specs:
Operating System: Windows 10
GPU: Nvidia Geforce MX250. Laptops usually come with installed GPU drivers and CUDA. If you are unaware of the name of your GPU go to
Device Manager > Display adapters to find your GPU.
Available versions: TensorFlow 2.1, 2.2, and 2.3 work off-the-shelf with CUDA 10.1 and cuDNN 7.6. You can see the tested build configurations here. It's also important to have installed Nvidia GPU driver 418.x or higher. The easiest way to find the versions available in your system is to open your command prompt and type
nvidia-smi. This way you can also learn if your GPU is CUDA capable.
-> Figure 1: Call
nvidia-smi to learn about available driver and CUDA versions. <-
We can either start by installing the correct drivers, CUDA, and cuDNN or by installing TensorFlow.
In your Anaconda command prompt, you can create a new Conda environment and then install TensorFlow using pip. You can also install via Conda however you might not get the latest release. You will install TensorFlow 2.3 until specified, in which case you will have to make sure that you have the correct CUDA and cuDNN version. However, the process will remain more or less the same.
# create a new conda environment > conda create -n tf2 # install tensorflow > pip install tensorflow
To check if you have successfully installed TensorFlow, simply open your Python console and import TensorFlow.
-> Figure 2: Import TensorFlow to see if you have successfully installed TensorFlow. <-
Note: The warning
Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found was raised because CUDA 10.1 was not installed in my system and is required by TensorFlow to use GPU. If you have already installed the correct versions of CUDA and cuDNN and you still get this warning then you have not set the
%PATH% environmental variable correctly. We will come back to this later in this report.
TensorFlow 2.1, 2.2, and 2.3 requires CUDA 10.1. CUDA 10.1 requires Nvidia GPU driver to be 418.x or higher.
First verify if you have a CUDA capable GPU. Check if your GPU is listed here. If you don't find your GPU listed there then check the legacy GPU page here. Note that the lists are not exhaustive as I couldn't find Geforce MX250 on either of the lists. However, I found this reddit thread which claimed it to be CUDA capable.
Go to this website to download CUDA 10.1 for windows.
Install the toolkit using the downloaded
.exe file. It will first run a system compatibility check. CUDA toolkit also requires a supported version of Microsoft Visual Studio(MSVS). While performing the check it couldn't find the required version of MSVS, however, I continued with the setup. More on this in this thread. It turned out that I was able to train my model on GPU without any hassle.
-> Figure 3: Install CUDA 10.1 ToolKit with default settings. <-
You can find the complete installation guide here.
TensorFlow requires cuDNN 7.6. Installing this is easy.
Download cuDNN v7.6.5 (November 5th, 2019), for CUDA 10.1. You will probably have to create an Nvidia account, fill a quick survey, and accept some terms and conditions to download cuDNN. It will install a zipped file(in the directory of your choice). Unzip it in a location of your choice. Let's call it
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vx.x\bin.
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vx.x\include.
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vx.x\lib\x64.
Open up your command prompt.
The easiest way to set up environment variables would be to use
C:\> SET PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin;%PATH% C:\> SET PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\extras\CUPTI\lib64;%PATH% C:\> SET PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\include;%PATH% C:\> SET PATH=<installpath>\cuda\bin;%PATH%
And that's it! You are set up for some serious deep learning now.
Now that you have successfully installed TensorFlow let's train a simple neural network and check GPU usage. Weights and Biases can automatically log important GPU metrics.
Open Anaconda command prompt and
conda activate tf2.
pip install --upgrade wandb
test-gpu-wandb.py from this GitHub Gist. You will need to
cd to the directory where you have downloaded the script.
python test-gpu-wandb.py. It will train a simple MNIST image classification model using your local GPU.
Weights and Biases will automatically log GPU metrics as shown below. :point_down: