Hello Everyone,
I am working on an older project that uses PyTorch version 1.12.0. This project works as expected on a Ubuntu system running NVIDIA-SMI version 450.80.02, Driver Version 450.80.02, and CUDA version 11.0. This system contains a Tesla V100 GPU.
I need to deploy this project to a newer machine that has NVIDIA-SMI 535.171.04, Driver Version 535.171.04, CUDA Version 12.2 and a NVIDIA RTX 6000 Ada GPU. When I deploy/run this project to this machine I receive the following error:
CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Based on my research this is because the PyTorch version I am using doesn’t support the NVIDIA RTX 6000 Ada GPU architecture (sm_89).
Upgrading to PyTorch 2.0+ is not trivial at the moment.
Questions / Asks:
-
Is it possible to compile PyTorch Version 1.12.0 to work with newer cuda architectures? (i.e. sm_89, sm_90)
-
Any recommendations or steps I can take to compile this exact build (i.e. is there a docker container I can used to build a wheel, what args etc.)
-
Are there any existing builds that already have this PyTorch Version and CUDA architecture support?