Compiling older version of PyTorch (1.12) with newer cuda architectures

Arthur.Putnam · June 21, 2024, 5:07pm

Hello Everyone,

I am working on an older project that uses PyTorch version 1.12.0. This project works as expected on a Ubuntu system running NVIDIA-SMI version 450.80.02, Driver Version 450.80.02, and CUDA version 11.0. This system contains a Tesla V100 GPU.

I need to deploy this project to a newer machine that has NVIDIA-SMI 535.171.04, Driver Version 535.171.04, CUDA Version 12.2 and a NVIDIA RTX 6000 Ada GPU. When I deploy/run this project to this machine I receive the following error:

CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1

Based on my research this is because the PyTorch version I am using doesn’t support the NVIDIA RTX 6000 Ada GPU architecture (sm_89).
image (66)
Upgrading to PyTorch 2.0+ is not trivial at the moment.

Questions / Asks:

Is it possible to compile PyTorch Version 1.12.0 to work with newer cuda architectures? (i.e. sm_89, sm_90)
Any recommendations or steps I can take to compile this exact build (i.e. is there a docker container I can used to build a wheel, what args etc.)
Are there any existing builds that already have this PyTorch Version and CUDA architecture support?

ptrblck · June 21, 2024, 8:51pm

Install the PyTorch 1.12.0 binary with CUDA 11.6 runtime dependencies and I think it should work as your GPU should have been already supported back then.

sm_90 won’t work as it was introduced in CUDA 11.8 which wasn’t released when PyTorch 1.12 was released.

Arthur.Putnam · June 21, 2024, 10:01pm

Thank you, I will give that a shot!

Arthur.Putnam · June 21, 2024, 10:20pm

I’m not seeing sm_89 as one of the supported architectures when using PyTorch 1.12.0 binary with CUDA 11.6. My understanding is that sm_89 is required to work with the NVIDIA RTX 6000 Ada GPU. @ptrblck can you confirm?

ptrblck · June 22, 2024, 1:22pm

No, the architecture is not explicitly needed as it’s compatible to sm_86 and sm_80. You can just run any example to verify it assuming your system can communicate with your GPU.

William_Zhang · June 24, 2024, 2:11pm

According to NGC PyTorch container image support matrix (Frameworks Support Matrix - NVIDIA Docs), the first/earliest version supporting Ada Lovelace is v22.10. This particular PyTorch container image supports

PyTorch 3.13
CUDA runtime 11.8
Python 3.8.

William_Zhang · July 9, 2024, 3:26pm

Among the following variables:

GPU device arch (e.g. RTX 6000 Ada) and compute capability (sm_89 for Ada)
GPU driver version and CUDA driver version
CUDA runtime version
PyTorch version

The first 2 are not supposed to change in our context. If we can determine the minimum required CUDA runtime version (bullet 3), we can determine PyTorch version (bullet 4) accordingly. So what is the minimum/earliest CUDA runtime version which supports our GPU (RTX 6000)?

The following evidences seem to conclude that it is CUDA 11.8:

Nvidia CUDA 11.8 release notes: “This release introduces support for both the Hopper and Ada Lovelace GPU families.”
CUDA wiki: The first CUDA version supporting Ada Lovelace is CUDA 11.8.
Nvidia Framework Support Matrix for NGC PyTorch containers: The first/earliest version of NGC PyTorch container supporting Ada Lovelace architecture is v22.10. In this container, PyTorch (v1.13) was built on CUDA 11.8 (on Ubuntu 20.04).

image902×625 64.3 KB

Without resorting to compiling PyTorch from source, we have the following options:

Use NGC PyTorch container 22.10 or newer
Use a PyTorch binary distribution: the minimum PyTorch binary version supporting CUDA 11.8 is PyTorch 2.0.0 (see the list below)

1.13.0+cpu, 1.13.0+cu116, 1.13.0+cu117, 1.13.0+cu117.with.pypi.cudnn, 1.13.0+rocm5.1.1, 1.13.0+rocm5.2, 1.13.1, 1.13.1+cpu, 1.13.1+cu116, 1.13.1+cu117, 1.13.1+cu117.with.pypi.cudnn, 1.13.1+rocm5.1.1, 1.13.1+rocm5.2, 2.0.0, 2.0.0+cpu, 2.0.0+cpu.cxx11.abi, 2.0.0+cu117, 2.0.0+cu117.with.pypi.cudnn, 2.0.0+cu118, 2.0.0+rocm5.3, 2.0.0+rocm5.4.2, 2.0.1, 2.0.1+cpu, 2.0.1+cpu.cxx11.abi, 2.0.1+cu117, 2.0.1+cu117.with.pypi.cudnn, 2.0.1+cu118, 2.0.1+rocm5.3, 2.0.1+rocm5.4.2, 2.1.0, 2.1.0+cpu, 2.1.0+cpu.cxx11.abi, 2.1.0+cu118, 2.1.0+cu121, 2.1.0+cu121.with.pypi.cudnn, 2.1.0+rocm5.5, 2.1.0+rocm5.6, 2.1.1, 2.1.1+cpu, 2.1.1+cpu.cxx11.abi, 2.1.1+cu118, 2.1.1+cu121, 2.1.1+cu121.with.pypi.cudnn, 2.1.1+rocm5.5, 2.1.1+rocm5.6, 2.1.2, 2.1.2+cpu, 2.1.2+cpu.cxx11.abi, 2.1.2+cu118, 2.1.2+cu121, 2.1.2+cu121.with.pypi.cudnn, 2.1.2+rocm5.5, 2.1.2+rocm5.6, 2.2.0, 2.2.0+cpu, 2.2.0+cpu.cxx11.abi, 2.2.0+cu118, 2.2.0+cu121, 2.2.0+rocm5.6, 2.2.0+rocm5.7, 2.2.1, 2.2.1+cpu, 2.2.1+cpu.cxx11.abi, 2.2.1+cu118, 2.2.1+cu121, 2.2.1+rocm5.6, 2.2.1+rocm5.7, 2.2.2, 2.2.2+cpu, 2.2.2+cpu.cxx11.abi, 2.2.2+cu118, 2.2.2+cu121, 2.2.2+rocm5.6, 2.2.2+rocm5.7, 2.3.0, 2.3.0+cpu, 2.3.0+cpu.cxx11.abi, 2.3.0+cu118, 2.3.0+cu121, 2.3.0+rocm5.7, 2.3.0+rocm6.0, 2.3.1, 2.3.1+cpu, 2.3.1+cpu.cxx11.abi, 2.3.1+cu118, 2.3.1+cu121, 2.3.1+rocm5.7, 2.3.1+rocm6.0)

Any flaw in the above analysis and conclusion?
Is there any other option short of compiling PyTorch from source?

William_Zhang · July 9, 2024, 4:08pm

Side note: after testing with various versions of PyTorch/CUDA, I realize:

torch.cuda.is_available = True or successful run of torch.rand(2,3).cuda() does not imply the installed PyTorch fully supports the installed GPU. It only indicates the installed CUDA driver (on Docker host) is compatible with installed CUDA runtime (in container). Such CUDA driver to CUDA runtime compatibility can be easily met by any CUDA runtime version <= (not newer than) CUDA driver version. For PyTorch to work with a given GPU device, PyTorch must have been compiled against a CUDA runtime version which supports the GPU device, or we will get the error as described here. As indicated above, the minimum version of CUDA supporting Ada Lovelace GPU is CUDA 11.8.

Please let me know if this is incorrect. Thanks.

ptrblck · July 9, 2024, 4:44pm

While your explanation is true that sm_89 was introduces in CUDA 11.8, it won’t matter for the functionality as we are neither building PyTorch for this compute capability explicitly (you can check it via torch.cuda.get_arch_list()) nor should we as no performance improvements are expected.
Since sm_89 is binary compatible to sm_86 and sm_80 (as explained above), you should be able to execute any PyTorch binary supporting sm_86/_80.
CUDA math libs have certainly improved their heuristics etc. in newer CUDA versions, which is why I would not recommend to go too far back.

So the TL;DR is still: @Arthur.Putnam should just run any simple example to verify that the build is working fine or just upgrade to any of our current binaries.

Arthur.Putnam · July 9, 2024, 7:00pm

I was successfully able to get our application running on the new system using CUDA 11.7 which confirms @ptrblck’s statement about it working with the sm_86 architecture. Thank you @ptrblck and @William_Zhang for talking through the requirements.

Incase others are coming here for similar issues. See details below.

To get this to work, I used the NVIDIA container image for PyTorch version 22-05 which comes with Python 3.8, CUDA 11.7 runtime, and the same version of PyTorch we previously used (1.12.x). You can look for different combinations of PyTorch and CUDA in their container repository. I did how ever need to manually compile/upgrade some of our Python dependencies to use CUDA 11.7 instead of the CUDA 10.2 that they were compiled on/for. This required me to manually build/compile those libraries in/on the docker container.