Are the requirements for using `torch.utils.cpp_extension` with CUDA documented anywhere?

When I install PyTorch via Conda with conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia, I am able to use CUDA devices out of the box, without installing anything at a system level. However, it seems that torch.utils.cpp_extension.load_inline will not work without some further setup. Through trial and error, I have found that I need to (A) install CUDA on the system outside of Conda, and (B) install gcc_linux-64 and gxx_linux-64 through Conda.

Without (A), I get various errors when attempting to compile CUDA code with cpp_extension.load_inline, e.g. No such file or directory <nv/target> or missing header files when including Cub/Thrust. Without (B), I get the error GLIBCXX_3.4.32' not found.

Are these requirements correct, and if so are they documented anywhere (I couldn’t see anything specific in the docs besides needing to install ninja)? If it’s correct that CUDA needs to be installed at the system level, then this could be quite annoying in terms of version mismatches, and I don’t really see why this would be necessary for cpp_extension but not anything else. I did try installing cuda-toolkit and cuda-cccl via Conda, but to no avail.

1 Like

In case anyone comes across this in the future, I did eventually find out how to make cpp_extension.load_inline use the version of CUDA installed through Conda:

The issue is that nvidia::cuda-toolkit doesn’t have all the necessary headers, but conda-forge::cuda-toolkit does, so you need to install the conda-forge version instead (I have no idea why these are different). The conda-forge version will also result in which nvcc returning the version in your conda env, which enables cpp_extension.py to find it (so you don’t need to manually set the CUDA_HOME environment variable like you do with nvidia::cuda-toolkit).

My understanding from the Conda Section in the CUDA Installation Guide is that nvidia::cuda is the preferred way to install all available packages for native CUDA development.

I also had issues with missing cuda_runtime.h and <nv/target> for nvidia::cuda==12.4. But the newest nvidia::cuda also includes nvidia::cuda-runtime which solved all issues for me.

To compile against the same runtime as the pytorch installation, you can explicitly include its path:

from nvidia.cuda_runtime.include import __path__ as CUDA_RUNTIME_INCLUDES # .../site-packages/nvidia/cuda_runtime/include/

This should work for nvidia-cuda-runtime-cu12, and other dependencies of PyTorch which are distributed as pip wheels (see Sec. 7 of CUDA Installation Guide).