CUDA error: the provided PTX was compiled with an unsupported toolchain [in Nvidia Ampere]

Ardian_Umam · August 28, 2023, 12:50pm

Hi,

I tried to install pointnet2 (a major architecture for point cloud data) and the installation went well without any error. However, when I try to import the operation via from pointnet2_ops import pointnet2_utils, I get this error:
CUDA error: the provided PTX was compiled with an unsupported toolchain.

Based on some searching, the error is because the cuda version used by pytorch is newer than the driver version installed on the machine. But, that’s not in my case. Here is my setup:

Cuda version in the Ubuntu machine: 11.3
Cuda version when install pytorch: 11.1
Pytorch: version 1.9.0
Python: version 3.9

This error happens in GPU Nvidia A5000 (ampere arch). When I install with the exactly same setting as above, but in my machine with Nvidia 2080 Ti, everything goes well. No any error was encountered. Is there any direction how to solve this issue in my nvidia A5000 machine?

Detailed error message:

Traceback (most recent call last):
  File "/home/aumam/dev/multimodal_distillation/model/model.py", line 457, in <module>
    net = net.cuda()
  File "/home/aumam/anaconda3/envs/pytorch1.9_4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 637, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/aumam/anaconda3/envs/pytorch1.9_4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 530, in _apply
    module._apply(fn)
  File "/home/aumam/anaconda3/envs/pytorch1.9_4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 530, in _apply
    module._apply(fn)
  File "/home/aumam/anaconda3/envs/pytorch1.9_4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 530, in _apply
    module._apply(fn)
  File "/home/aumam/anaconda3/envs/pytorch1.9_4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 552, in _apply
    param_applied = fn(param)
  File "/home/aumam/anaconda3/envs/pytorch1.9_4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 637, in <lambda>
    return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA error: the provided PTX was compiled with an unsupported toolchain.
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Many thanks

ptrblck · August 28, 2023, 1:23pm

It seems Ampere GPUs are not supported since TORCH_CUDA_ARCH_LIST is hardcoded to GPUs up to sm_75 here.
In any case, you’ve already cross-posted the question so I would expect the authors to know more about the limitations.

Ardian_Umam · August 29, 2023, 1:50am

Thanks much for pointing that out! I added “8.6” in the TORCH_CUDA_ARCH_LIST and it solved the issue.

04RR · August 30, 2023, 8:30am

Hey, I’m facing the same issue but I am using a RTX 4090 card. I installed the cuda toolkit version 12.0 from here.

I have pytorch2.0 with pytorch-cuda=11.8 and python 3.11.4

PS: I just got this card and I’m in the process of setting it up, please let me know if I should install any additional drivers.

ptrblck · August 30, 2023, 2:42pm

I guess you are seeing this error from OpenAI/Triton? If so, then note that they ship ptxas from CUDA 12 and you might need to driver update as described here.

04RR · August 30, 2023, 3:21pm

Hey, I got this error while trying to load a gguf model using ctranformers

llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7B-GGML", gpu_layers=50)

ptrblck · August 30, 2023, 3:57pm

Do you know if your call uses torch.compile or anything from OpenAI/Triton under the hood?

thanh_phat_ph_m · August 17, 2024, 5:35pm

Hi @ptrblck , I have a trouble with this problem.
If I run the file init.py directly, my tensor can be transformed to CUDA, but if I run another script (and this script import and call the file init.py), it has the problem.
Can you help me for this problem ? Thanks for your help

P/s : nvidia-smi and nvcc --version
NVIDIA-SMI 460.80 Driver Version: 460.80 CUDA Version: 11.2

Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

ptrblck · August 17, 2024, 9:06pm

Try to clear the caches for jit’ted kernels as well as kernels created by Triton.

thanh_phat_ph_m · August 18, 2024, 5:23am

Thanks for your reply, I try to uninstall triton, and clear the caches for jit’ted kernels but it still error

import torch
torch._C._jit_clear_class_registry() # Clear the JIT class registry and pass managers

Can you tell me more the detail solutions for this problem ? Thank you for your help.

ptrblck · August 19, 2024, 8:59pm

I haven’t encountered the issue myself, so unsure what exactly is causing it.
Based on your config I guess it could be related to OAI/Triton shipping with ptxas from CUDA 12.x while your driver supports CUDA 11.x as described in this issue. You could try to apply the workaround of specifying TRITION_PTXAS_PATH pointing to a ptxas from CUDA 11.

thanh_phat_ph_m · August 21, 2024, 6:39am

Thanks for your reply, I solved this problem, the reason is that the nvidia-smi version and nvcc --version mismatch together, so when I built submodules, it cannot work to transform tensor to cuda.
Thanks for your help.