torch.cuda.is_available()=False;torch._C._cuda_getDeviceCount() > 0

I used the official mirror from the PyTorch website:

docker pull pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel

My host machines are WSL2 and CentOS8 (I tried both, and they produced the same error),and their cuda driver are cuda 12.4, (one is 4090, another is Nvidia A10).

The nvidia-smi output is normal, but torch.cuda.is_available() returns False. However, when I run:

CUDA_DEVICE_ORDER="PCI_BUS_ID" PYTORCH_NVML_BASED_CUDA_CHECK=1 python -c "import torch; print(torch.cuda.is_available())"

It returns True.

But in practice, when I run:

CUDA_DEVICE_ORDER="PCI_BUS_ID" PYTORCH_NVML_BASED_CUDA_CHECK=1 python demo.py

It still throws the following error (where demo.py is my LLM inference script).

I’ve also tried other official versions:

  • pytorch/pytorch:2.4.0-cuda12.4-cudnn9-devel
  • pytorch/pytorch:2.3.1-cuda12.4-cudnn8-devel

But torch.cuda.is_available() still returns False.

Surprisingly, I have another image (pytorch:2.3.1-cuda12.4-cudnn8), which was not pulled from the official PyTorch repository, and it works perfectly fine.

Help!! Please save me!! :sob:

Why? Why? Why?

>>> torch.cuda.device_count()
6
>>> torch.cuda.is_available()
/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:129: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 304: OS call failed or operation not supported on this OS (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
False
>>> torch.cuda.current_device() 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 940, in current_device
    _lazy_init()
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 319, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 304: OS call failed or operation not supported on this OS
>>> torch.cuda.device(0)
<torch.cuda.device object at 0x7f697c8f4070>