Compatability issues between CUDA and NVIDIA Drivers

asamra18 · July 15, 2021, 1:21pm

Hi Everyone, I’m trying to train an RL agent on my GPU to speed up processing time, but for some reason pytorch isn’t finding any devices. I’ve tried installing 2 versions of pytorch from the install site for cuda versions 11.1 and 10.2 and both don’t work. After running nvidia-smi i see the Driver version is 465.27 and the CUDA version is 11.3, and I’m working with GeForce RTX 2080 Ti on Ubuntu in a Conda environment. When I check the torch version by printing its value, I see its 11.1 and that no devices are available.

I know that this should be working because when my PhD colleague runs our code from his machine, the GPU training is working properly. After updating drivers, and using different installations from the Pytorch website, I’m now completely stuck as to how I can solve this.

ptrblck · July 15, 2021, 7:03pm

I guess your NVIDIA driver installation might not have been successful, as PyTorch cannot detect any GPUs or alternatively, you are installing the CPU-only binaries.
We’ve seen these issues in the past when users forgot to restart their machine after a driver update, so maybe this could also be the case for you?