I’m working on a project which needs some additional CUDA Code and I want to compile it during the installation of my python package for multiple GPU architecture.
The whole setup works fine for my local GPU (RTX 2080Ti, CUDA 10.1) but when my job is running on a different GPU model (e.g. on our cluster) it crashes with the following message:
RuntimeError: CUDA error: no kernel image is available for execution on the device
export TORCH_CUDA_ARCH_LIST="3.5;3.7;5.0;5.2;6.0+PTX;6.1+PTX;7.0+PTX;7.5+PTX" but it still does not work (the
setup.py is very similar to the detection2 one https://github.com/facebookresearch/detectron2/blob/master/setup.py). Any idea what the problem might be? Are there any additional steps to ensure that it really compiles for multiple architectures (I’m running a trial and error setup right now)?
Thank you in advance