Setting up a c++/cuda extention in a pypip package

I have a simple package consisiting of a few c++/cuda algorithms all wrapped up to run on torch tensors in python. I would like to publish this as a pypip package. I assume I want a source distribution only as this means the code will be compiled on the users system and so will have the correct cuda version. However, when ever I try to install my test package I’ve uploaded to test.pypip I get the following error…

RuntimeError:
The detected CUDA version (11.4) mismatches the version that was used to compile
PyTorch (10.2). Please make sure to use the same CUDA versions.

not sure how to solve this, I can “pip install torch” and build my source package just fine locally, so how come it doesn’t work when building the wheel during a pip install?

Based on the error message it seems your local CUDA toolkit version (used to build the package) is 11.4 while you are using the PyTorch pip wheels with the CUDA 10.2 runtime.
I don’t know why your local setup works, as I would expect to see the same error. Are you using different CUDA Toolkits or different PyTorch installations in your local setup?

Nope, I’m publishing and installing in the same environment.

When you say used to “build the package” do you mean, used to build the package which is uploaded to pypip or used to build the package when the user installs?

I think my understanding was that the c++/cuda code would be compiled on the users machine, But from what you’re saying I’m getting the feeling that it is compiled when I publish the package. In which case I have to publish a verson for all versions of cuda?

Building the package before publishing it would allow users to run your code directly without installing a local CUDA toolkit (e.g. as is done with the PyTorch binaries).
If you want the user to build it locally, the easier approach might be to just build your package from source from your repository.
You could also try to lazily build and load the extensions in your code, but should hit the same error of the mismatching CUDA versions.

Yes I had sort of settled on just JIT compiling the first time the user calls the code. Seems like there could be a better solution though. Perhaps a way for extention packages to piggy back off the work already done to make torch a seamless install.

Im assuming the cuda included in the torch binaries does not include the tools required to build the extention?

That’s correct. The binaries do not include e.g. the CUDA compiler, but ship with the CUDA runtime.

Okay, so the reason why it was working fine locally installing it in my dev environment is because the venv I was developing in had torch 1.9 which doesnt do the cuda version check that is present in torch 1.10 in cpp_extention.py (line 781). The code was running fine once compiled despite the cuda miss-match. I assume this could be quite unstable and theres no guarantee that the code compiled by a different version of cuda will interop well, hence the inclusion of the version check in 1.10.