CUDAExtension for multiple GPU Architectures

Ahoi :slight_smile:
I’m working on a project which needs some additional CUDA Code and I want to compile it during the installation of my python package for multiple GPU architecture.

The whole setup works fine for my local GPU (RTX 2080Ti, CUDA 10.1) but when my job is running on a different GPU model (e.g. on our cluster) it crashes with the following message:

RuntimeError: CUDA error: no kernel image is available for execution on the device

I tried export TORCH_CUDA_ARCH_LIST="3.5;3.7;5.0;5.2;6.0+PTX;6.1+PTX;7.0+PTX;7.5+PTX" but it still does not work (the setup.py is very similar to the detection2 one https://github.com/facebookresearch/detectron2/blob/master/setup.py). Any idea what the problem might be? Are there any additional steps to ensure that it really compiles for multiple architectures (I’m running a trial and error setup right now)?

Thank you in advance :slight_smile:

Apparently it was some kind of problem with an old cached version … works now …