Pytorch version error

shrbrh · May 31, 2023, 6:46am

Can anyone please tell me what version of Pytorch is compatible with A100-PCIE-40GB with CUDA capability sm_80. I have tried installing different versions but keep getting the same error: “RuntimeError: CUDA error: no kernel image is available for execution on the device.CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1”
Even after passing CUDA_LAUNCH_BLOCKING=1, I get the same error :“RuntimeError: CUDA error: no kernel image is available for execution on the device.” Please help.

ptrblck · May 31, 2023, 7:02am

Every PyTorch build using CUDA>=11.0 is compatible with sm_80 and thus your A100.
The latest stable torch==2.0.1 release uses CUDA 11.7 and 11.8, and the current nightly builds use CUDA 11.8 and 12.1. All of them will work on your A100.

shrbrh · May 31, 2023, 7:32am

So is there any other reason due to which I might be getting this error? My script runs fine in V100.

ptrblck · May 31, 2023, 7:56am

It depends which PyTorch version you have installed and you could check it via python -m torch.utils.collect_env. E.g. if it’s an older PyTorch release with CUDA 10.2 this error would be expected and you would need to update.

shrbrh · May 31, 2023, 10:07am

These are the details:

PyTorch version: 1.10.1+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

Do I need to update my Pytorch version?

ptrblck · May 31, 2023, 3:58pm

Yes, you would need to update the PyTorch binary and install one with CUDA 11, as the currently installed older one uses 10.2 as I’ve already guessed.