Unable to get pytorch working with CUDA 11.3

Michael_Lucy · June 4, 2022, 1:32am

Hi,

When I install either pytorch 1.11 or the nightly version with CUDA 11.3, torch.cuda.is_available() returns false. (I can’t use CUDA version 10.2 because I’m trying to use a 3090.)

Any idea what might be causing this? I posted the output of torch.utils.collect_env below.

(Also, I’m trying to install on wsl2 if that’s relevant.)

Collecting environment information...
PyTorch version: 1.13.0.dev20220603+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: 10.0.0-4ubuntu1
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.8.10 (default, Mar 15 2022, 12:22:08)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.29
Is CUDA available: False
CUDA runtime version: 11.3.58
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090
Nvidia driver version: 512.95
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.4.1
/usr/local/cuda-11.7/targets/x86_64-linux/lib/libcudnn.so.8.4.1
/usr/local/cuda-11.7/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.4.1
/usr/local/cuda-11.7/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.4.1
/usr/local/cuda-11.7/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.4.1
/usr/local/cuda-11.7/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.4.1
/usr/local/cuda-11.7/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.4.1
/usr/local/cuda-11.7/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.4.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.22.4
[pip3] torch==1.13.0.dev20220603+cu113
[pip3] torchaudio==0.12.0.dev20220603+cu113
[pip3] torchvision==0.14.0.dev20220603+cu113
[conda] Could not collect

ptrblck · June 4, 2022, 6:27am

Are you able to run any CUDA applications (e.g. the CUDA samples) at all in your current setup?
I assume when running python -c "import torch; print(torch.randn(1).cuda())" you'll see an error complaining about the driver? If so, do you see any issues in nvidia-smiordmesg` as the driver might be in a “bad state”.
E.g. it’s a common issue of the driver and Linux interaction to see this behavior after waking Linux from the “suspend” mode and in my setup running:

sudo rmmod nvidia_uvm
sudo modprobe nvidia_uvm

usually helps.

Michael_Lucy · June 4, 2022, 6:42pm

I do seem to be able to run the CUDA examples:

mlucy@DESKTOP-MIBK3CH:~/cuda-samples/bin/x86_64/linux/release$ ./eigenvalues
Starting eigenvalues
GPU Device 0: "Ampere" with compute capability 8.6

Matrix size: 2048 x 2048
Precision: 0.000010
Iterations to be timed: 100
Result filename: 'eigenvalues.dat'
Gerschgorin interval: -2.894310 / 2.923303
Average time step 1: 0.920770 ms
Average time step 2, one intervals: 1.084130 ms
Average time step 2, mult intervals: 2.393340 ms
Average time TOTAL: 4.448770 ms
Test Succeeded!

When I run the command you gave, I get this, not a complaint about the driver:

mlucy@DESKTOP-MIBK3CH:~/cuda-samples/bin/x86_64/linux/release$ python -c "import torch; print(torch.randn(1).cuda())"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/mlucy/.local/lib/python3.8/site-packages/torch/cuda/__init__.py", line 217, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available

I followed the instructions at CUDA on WSL :: CUDA Toolkit Documentation for setting up CUDA with wsl2, and they said to install a display driver on the Windows side but not the Linux side.

Michael_Lucy · June 4, 2022, 6:46pm

Actually, I think I just found the fix. The problem is that installing the NVIDIA driver on the Windows side put a new libcuda.so into the Linux side in an odd place. If I add LD_LIBRARY_PATH=/usr/lib/wsl/lib/ to force pytorch to use /usr/lib/wsl/lib/libcuda.so preferentially, everything runs fine.

Thanks for the help! I was stuck on that for a long time, and putting me on to looking at the driver documentation again was what got me unstuck.