No cuda kernel image available for execution on device Custom cuda kernel

Hi there,
i wrote a custom cuda kernel for pytorch 2.3.0 and cuda 12.1,
my nvcc-version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Fri_Jun_14_16:34:21_PDT_2024
Cuda compilation tools, release 12.6, V12.6.20
Build cuda_12.6.r12.6/compiler.34431801_0

and compiled it via setuptools: e.g.

from setuptools import setup
from torch.utils.cpp_extension import BuildExtension, CUDAExtension



setup(
    name='test',
    ext_modules=[
        CUDAExtension('test_k', [
            'test.cpp',
            'test_cuda_kernel.cu'
        ])
    ],
    cmdclass={
        'build_ext': BuildExtension
    })

The kernel was compiled on an rtx3090, however when trying to run it on an a100 i get the following error:
RuntimeError: CUDA error: no kernel image is available for execution on the device
What am i doing wrong?

Compile your kernel for sm_80 by specifying it via TORCH_CUDA_ARCH_LIST="8.0;8.6".

Thank you very much :smile: