Not able to include cusolverDn.h

PyTorch94 · December 28, 2022, 6:56am

I installed Pytorch in Ubuntu 20.0 using both conda command and pip3 command (in a conda environment).
However, I am not able to find the cusolverDn.h header anywhere inside the anaconda3 folder.
I am using ATen/cuda/CUDAContext.h header file in CUDA implementation, which depends on cusolverDn.h (as the error message says).
Please help in rectifying the issue.

ptrblck · December 28, 2022, 7:53am

The cuSOLVER headers would ship with your locally installed CUDA toolkit. Could you explain your use case a bit more as it seems you are trying to build PyTorch from source using the conda binaries?

PyTorch94 · December 28, 2022, 9:55am

I was trying to work with the official deformable-DETR network, which has the deformable attention module as a CUDA implementation. While implementing it, they included the torch library header files for ATen and others, including the one mentioned in the query.
I am working on a system with Ubuntu 20.04 and anaconda as a package manager (conda 22.11.1). The nvcc version is v10.1.243. I tried installing PyTorch using both the conda command and the pip3 given in the official website. I didn’t install it from the source. But in both cases, VS Code Intellisense threw an error that cusolverDn.h can not be opened. upon searching for the file, I could not find it.

ptrblck · December 28, 2022, 7:24pm

The PyTorch binaries do not ship with the entire CUDA toolkit and all of its headers.
If you want to build Deformable-DETR from source, make sure to install PyTorch as one dependency, but use your locally installed CUDA toolkit (including the cuSOLVER headers) to build the DETR lib.

HeyangQin · January 5, 2023, 9:05pm

I think this may be related to the recent change in the conda packages. I just created a new conda environment, installed pytorch according to the official documentation(conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia) and tried to compile apex from source. The pip commandline looks like this:

/vc_data/users/heyangqin/anaconda3/envs/deepspeed/bin/nvcc  -I/vc_data/users/heyangqin/anaconda3/envs/deepspeed/lib/python3.10/site-packages/torch/include -I/vc_data/users/heyangqin/anaconda3/envs/deepspeed/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/vc_data/users/heyangqin/anaconda3/envs/deepspeed/lib/python3.10/site-packages/torch/include/TH -I/vc_data/users/heyangqin/anaconda3/envs/deepspeed/lib/python3.10/site-packages/torch/include/THC -I/vc_data/users/heyangqin/anaconda3/envs/deepspeed/include -I/vc_data/users/heyangqin/anaconda3/envs/deepspeed/include/python3.10 -c -c /vc_data/users/heyangqin/apex/csrc/multi_tensor_sgd_kernel.cu -o /vc_data/users/heyangqin/apex/build/temp.linux-x86_64-cpython-310/csrc/multi_tensor_sgd_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 -std=c++14

This pip compile commandline calls the nvcc in the conda env and it does not include the system CUDA dir /usr/local/cuda/include/ where the cusolverDn.h locates which causes the error. So I manually updated the PATH by export PATH=/usr/local/cuda/bin:$PATH and the error is gone. I wonder if this is the intended behavior?

ptrblck · January 6, 2023, 12:58am

Your explanation sounds right and I don’t think nvcc is supposed to ship as a dependency and be usable for source build from the conda environment (at least we are not testing this use case).
Using the local CUDA toolkit is the right approach and we should try to filter out some packages for the next release.

senthilps8 · January 20, 2023, 8:28pm

I ran into the same issue because nvcc has been packaged with pytorch in the latest release. In my case, the fix was not a trivial as updating the PATH variable. A package compiled with the local nvcc throws this error: provided PTX was compiled with an unsupported toolchain. From this link, it looks like the issue might be that driver version expected by the pytorch nvcc is different from the local nvcc. Unfortunately, I cannot update my driver and I still cannot figure out a real solution but I have a temporary workaround for my use case. Just leaving it here in case others run into this. Hopefully the next releases will revert back to not including nvcc.

hamedB · January 22, 2024, 6:25pm

Dear @ptrblck

I am facing with such cusolverDn.h issue. The reason is that I am trying to install the deformable convolution from this link. It has a setup file in the folder " Deformable-Convolution-V2-PyTorch".
The exact error I face is as follow.

"LOCALPATH/anaconda3/envs/pytorch_1_13/lib/python3.9/site-packages/torch/include/ATen/cuda/CUDAContext.h:10:10: fatal error: cusolverDn.h: No such file or directory

10 | #include <cusolverDn.h>
| ^~~~~~~~~~~~~~
compilation terminated.
error: command ‘/usr/bin/gcc’ failed with exit code 1
"

As error shows cusolverDN.h does not exit. However, the file CUDAContext.h does exist.
I have upgraded my nvidia cuda version to 12.2. My pytorch version is 1.13.1 and the pytorch-cuda is 11.6.

When I want to run the setup.py file from terminal, I activate my anaconda env. So, it uses locally installed CUDA toolkit. Can you explain more what other things I can try to fix this error?

Thanks a lot for your attention.

ptrblck · January 22, 2024, 6:31pm

You could search for this file on your system, which should point to a the locally installed CUDA toolkit and add the path into the includes if needed (I would assume the 3rd party repository already does this).
E.g. on my system I see:

find /usr/ -name cusolverDn.h
/usr/local/cuda-12.3/targets/x86_64-linux/include/cusolverDn.h

hamedB · January 22, 2024, 6:39pm

Thanks @ptrblck for your quick responds.
I can not find cusolverDn.h in my system.
I just installed anaconda 3, followed by installing an env. including pytorch libraries.
Would you please tell me what other package(s) I should install to have cusolverDn.h?

Thanks again.

ptrblck · January 22, 2024, 6:41pm

You should install a full CUDA toolkit from: https://developer.nvidia.com/cuda-downloads

hamedB · January 22, 2024, 6:56pm

Thanks a lot @ptrblck . I have just installed the Base Installer package from the link and the cusolverDn.h was added to the following address in my machine.

/usr/local/cuda-12.3/targets/x86_64-linux/include/cusolverDn.h

I didn’t install again the NVIDIA Driver as I have it.
A silly question , The part related to adding the path to the includes is not clear for me . would you explain where I should add it?

hamedB · January 22, 2024, 7:44pm

@ptrblck I added the path including this .h file to $PATH. However, I still receive the same error before. I am not sure if I got what you mean about adding into includes.

hamedB · January 23, 2024, 5:27pm

Dear @ptrblck

I was able to solve the issue with the file cusolverDn.h by finding it and adding its path to $PATH. Now the installation process goes further and faces a new error as follow.

home/hamed/PycharmProjects/Deformable_ProtoPNet/Deformable-Convolution-V2-PyTorch/src/cuda/deform_psroi_pooling_cuda.cu:18:10: fatal error: THC/THC.h: No such file or directory
18 | #include <THC/THC.h>
| ^~~~~~~~~~~
compilation terminated.
error: command ‘/usr/local/cuda/bin/nvcc’ failed with exit code 1

Has THC/THC been removed from Pytorch 2.1.2?

ptrblck · January 23, 2024, 7:12pm

Yes, the THC namespace is dead as described in this topic so you would need to update your code to use the new ATen API.

ssbarve2 · April 6, 2024, 7:22pm

I was facing this similar issue. In my case specifically, compiler wasn’t looking for cusolverDn.h at the right location.
Where it is present:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
V11.6
$ find /usr/ -name cusolverDn.h
/usr/local/cuda-11.6/targets/x86_64-linux/include/cusolverDn.h

ERROR: Compiler was looking for it within the pytorch package.

/../miniconda3/envs/nerfacc/lib/python3.9/site-packages/torch/include/ATen/cuda/CUDAContext.h:10:10: fatal error: cusolverDn.h: No such file or directory
   10 | #include <cusolverDn.h>

Added the following to my bashrc script, resolved the issue. (adding path to cuda toolkit in $PATH var didn’t help in my case)

export CPLUS_INCLUDE_PATH=/usr/local/cuda-11.6/include:$CPLUS_INCLUDE_PATH
export C_INCLUDE_PATH=/usr/local/cuda-11.6/include:$C_INCLUDE_PATH