Compatibility between CUDA 12.6 and PyTorch

Hello,

I’m in the process of fine tuning a LLM, and my machine has these specifications:

NVIDIA RTX A6000
NVIDIA-SMI 560.35.03
CUDA Version: 12.6

I have hard time to find the right PyTorch packages that are compatible with my CUDA version.

Does anyone know which one should I download it?
Every suggestion is welcome.

Thank you

Your locally installed CUDA toolkit won’t be used unless you build PyTorch from source or a custom CUDA extension, since the PyTorch binaries ship with their own CUDA runtime dependencies. You can thus select any PyTorch binary with CUDA 12.x and I would recommend using the latest one (12.4.1).

2 Likes

so it won’t be a problem right i have cuda version 12.6 too

this is what it says so what should i do now

Could not find a version that satisfies the requirement torch==2.3.0+cu126 (from versions: 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.4.0, 2.4.1, 2.5.0, 2.5.1)
ERROR: No matching distribution found for torch==2.3.0+cu126

You need to install a supported binary from PyTorch.org while it seems you are trying to manipulate the version tags yourself.

thanks a lot for replying i was installing a text to speech model which said that it needed pytorch.
well a gave up and installed cuda 11.8 let’s see if it works

Hello @ptrblck I am having a similar issue and would appreciate your opinion.

I set up a EC2 instance on AWS (g4dn.xlarge, Amazon Linux, Deep Learning Base OSS Nvidia Driver GPU AMI) and got access to a T4 with cuda 12.6. nvidia-smi confirms the above

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.05              Driver Version: 560.35.05      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       On  |   00000000:00:1E.0 Off |                    0 |
| N/A   21C    P8             11W /   70W |       1MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

I went ahead and installed torch

pip3 freeze | grep torch
pytorch-triton==3.2.0+git35c6c7c6
torch==2.5.1+cu124

and tried out a simple command

python3 -c "import torch;print(torch.cuda.is_available())"

The output of this command is the following

/home/ec2-user/.local/lib/python3.9/site-packages/torch/cuda/__init__.py:129: UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
False

I have tried a few things listed in similar questions but have gotten no luck. I also find it interesting that I am not doing anything insanely complicated. It is an EC2 instance setup with a simple torch.cuda.is_available() invocation. Would you have any thoughts?

UserWarning: CUDA initialization: CUDA driver initialization failed...

Your setup is unable to initialize the driver and you could try to run any other CUDA application which would raise the same error. I don’t know what your exact setup is but make sure your environment or container can properly communicate with the NVIDIA driver.

I apologize if the question has already been asked, but I currently use Python 3.12 and torch==2.5.1+cu124.

I don’t quite understand exactly how this works and whether it is correct.
The code runs without errors, and the speed of its work has even increased: on the CPU-up to about 20 seconds per iterative step, and on the GPU - up to 15 seconds.

Screenshot

This is mostly an experiment, but it would be interesting to see if it works properly.
Thanks!

I’m not sure what your question is, but note that PyTorch binaries ship with their own CUDA runtime dependencies. Your locally installed CUDA toolkit won’t be used unless you build PyTorch from source or a custom CUDA extension.

Hello @mictad and @greek_freak , I was having the exact same issue as you. I finally figured out a fix. Here’s the solution…

  1. CUDA is backward compatibile:- meaning, frameworks built for an earlier version of CUDA (e.g., 12.1) can still run on GPUs and drivers that support a later version of CUDA (e.g., 12.6). In your case:
  • Your GPU driver (560....) supports CUDA 12.6.
  • PyTorch officially supports CUDA 12.1 as the latest compatible version, which is backward-compatible with your setup.
  1. Libraries like PyTorch with CUDA 12.1 support execute on systems with CUDA 12.6 because the newer driver includes support for all functionality in earlier CUDA versions (12.4, 12.1, 11.8).
  2. Since PyTorch has been extensively tested with CUDA 12.1, I recommend CUDA 12.1 for better stability and performance. When PyTorch uses CUDA 12.1, it relies on the GPU driver to provide necessary APIs and kernel functions. Your CUDA 12.6-enabled driver fully supports these, so no additional configuration is needed. To install see this link: Start Locally | PyTorch

Could you please give examples of a better stability and performance in our CUDA 12.1 builds against e.g. 12.4 or 12.6?

Available discussions and reports provide insights into their relative performance and stability. Here are some:

  1. A user compared PyTorch 2.4.0 with CUDA 12.4 against PyTorch 2.1.2 with CUDA 12.1 on an NVIDIA 4080 GPU. The findings indicated that the older version (PyTorch 2.1.2 with CUDA 12.1) was faster. source: We could update to last Torch & Xformers? · AUTOMATIC1111/stable-diffusion-webui · Discussion #16394 · GitHub
  2. During the integration of CUDA 12.4, unexpected errors were encountered in PyTorch’s Inductor backend, leading to performance regressions. source: CUDA 12.4 CI Inductor Issues · Issue #126692 · pytorch/pytorch · GitHub
    etc… CUDA 12.4 and 12.6 introduce exclusive features and optimizations not available in 12.1 but PyTorch developers have extensively tested CUDA 12.1 more than 12.4 and 12.6, making 12.1 more stable and reliable.

I see the opposite:

I have installed PyTorch 2.4.0 with CUDA 12.4. On an RTX 4080, SD1.5 is about 15% to 20% faster, and SDXL is about 10% faster.

I also see all issues were closed.

I doubt it and recommend using the latest release.
E.g. CUDA 12.6U3 improved the matmul performance significantly and I would recommend sticking to this release.
If issues are seen, please let us know.

Personally, I haven’t experimented much with newer CUDA versions. From my research on web and community platforms, I found CUDA 12.1 to be the most stable and well-supported by PyTorch Developers, so I chose it. I’ll try newer versions if needed. Thanks!

1 Like