CUDA compatibility

Bryan_Arguello · March 5, 2024, 10:44pm

Hello,

My application requires PyTorch 2.2. When I look at at the Get Started guide, it looks like that version of PyTorch only supports CUDA 11.8 and 12.1

I am working on NVIDIA V100 and A100 GPUs, and NVIDIA does not supply drivers for those cards that are compatible with either CUDA 11.8 or 12.1

Are these really the only versions of CUDA that work with PyTorch 2.2? Would using a CUDA version like 11.7, 12.0, or 12.2 cause any issues?

eqy · March 5, 2024, 10:52pm

I am not sure where you are seeing that drivers compatible with A100 and V100 are not compatible with e.g., 12.1/11.8. e.g., see the list of supported cards for Version 525.60.13(Linux)/527.41(Windows) :: NVIDIA Data Center GPU Driver Documentation which is the oldest compatible driver according to: CUDA Compatibility :: NVIDIA Data Center GPU Driver Documentation

Furthermore, you are referring to CUDA versions which PyTorch provides prebuilt binaries for—you are also free to build PyTorch from source (and PyTorch’s CUDA components using your local CUDA toolkit) if you wish to use a newer CUDA toolkit.

Bryan_Arguello · March 5, 2024, 11:03pm

I tried to download a NVIDIA driver from their advanced driver search page.
If you look at the screenshot below, it does not include 11.8 or 12.1.
I chatted with a rep at NVIDIA and they confirmed that this driver search page is accurate.

I acknowledge that I might be missing something (such as building PyTorch from source).
Since I can do that, I believe that I can install CUDA 12.2 on our GPU’s and just build PyTorch from source.

eqy · March 5, 2024, 11:09pm

That’s unfortunate that the download page is missing those entries. However minor version compatibility should be a thing and you should be able to use e.g., the driver for 12.0 under 12.1, and the driver for 11.7 under 11.8, etc.

fermat97 · March 7, 2024, 8:20pm

Hi, I faced a similar issue (with A100) and I tried different versions of Cuda and Pytorch. Even with PyTorch 2.0.0 and Cuda version 11.7 I still get the following error while running torch.cuda.is_available():

python3.10/site-packages/torch/cuda/__init__.py:107: UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0
False

ptrblck · March 7, 2024, 8:42pm

The issue points to a driver failure, so you might want to reinstall it.

The same issue was described here and indeed the driver wasn’t properly installed.