TensorFlow suddenly not detecting GPU on my GCP VM

Created on July 2|Last edited on July 2
Comment
I had created a GCP VM with the latest release of TensorFlow VM i.e, 2.5.0. Everything seems to work fine but suddenly TensorFlow fails to detect GPU even though it was detecting GPU in the past. 
The GCP VMs are created with CUDA 11.0 however, TensorFlow >= 2.5.0  requires CUDA 11.2. What's interesting was that everything worked fine for some time!
After messing up with TensorFlow versions and failing to update CUDA, I decided to delete the VM and create a new one. Obviously, I had to migrate the training pipeline. A total waste of time. 
Turns out there's a really easy solution to fix this issue. Yeah, we will update the CUDA from 11.0 to 11.2. Yes, I failed to do it in the past but luckily I found this Stackoverflow answer that just pointed to a Google Cloud documentation page. 
SolutionOpen a terminal in your GCP notebook instance. 
curl -O https://storage.googleapis.com/nvidia-drivers-us-public/GRID/GRID12.1/NVIDIA-Linux-x86_64-460.32.03-grid.run
sudo bash NVIDIA-Linux-x86_64-460.32.03-grid.run
You will have to OK few things by pressing Enter on your keyboard. 
Now do nvidia-smi and note CUDA version updated to 11.2.
﻿
To test if GPU is being detected type python in the terminal to open a Python shell. 
>>> import tensorflow as tf
>>> tf.__version__
2.5.0
>>> tf.config.list_physical_devices('GPU')
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
It solved my problem but I can't confidently say this is the best solution. TensorFlow can mess with your head real good. :)
﻿
Add a comment