Not able to include cusolverDn.h

I installed Pytorch in Ubuntu 20.0 using both conda command and pip3 command (in a conda environment).
However, I am not able to find the cusolverDn.h header anywhere inside the anaconda3 folder.
I am using ATen/cuda/CUDAContext.h header file in CUDA implementation, which depends on cusolverDn.h (as the error message says).
Please help in rectifying the issue.

The cuSOLVER headers would ship with your locally installed CUDA toolkit. Could you explain your use case a bit more as it seems you are trying to build PyTorch from source using the conda binaries?

I was trying to work with the official deformable-DETR network, which has the deformable attention module as a CUDA implementation. While implementing it, they included the torch library header files for ATen and others, including the one mentioned in the query.
I am working on a system with Ubuntu 20.04 and anaconda as a package manager (conda 22.11.1). The nvcc version is v10.1.243. I tried installing PyTorch using both the conda command and the pip3 given in the official website. I didn’t install it from the source. But in both cases, VS Code Intellisense threw an error that cusolverDn.h can not be opened. upon searching for the file, I could not find it.

The PyTorch binaries do not ship with the entire CUDA toolkit and all of its headers.
If you want to build Deformable-DETR from source, make sure to install PyTorch as one dependency, but use your locally installed CUDA toolkit (including the cuSOLVER headers) to build the DETR lib.

1 Like

I think this may be related to the recent change in the conda packages. I just created a new conda environment, installed pytorch according to the official documentation(conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia) and tried to compile apex from source. The pip commandline looks like this:

/vc_data/users/heyangqin/anaconda3/envs/deepspeed/bin/nvcc  -I/vc_data/users/heyangqin/anaconda3/envs/deepspeed/lib/python3.10/site-packages/torch/include -I/vc_data/users/heyangqin/anaconda3/envs/deepspeed/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/vc_data/users/heyangqin/anaconda3/envs/deepspeed/lib/python3.10/site-packages/torch/include/TH -I/vc_data/users/heyangqin/anaconda3/envs/deepspeed/lib/python3.10/site-packages/torch/include/THC -I/vc_data/users/heyangqin/anaconda3/envs/deepspeed/include -I/vc_data/users/heyangqin/anaconda3/envs/deepspeed/include/python3.10 -c -c /vc_data/users/heyangqin/apex/csrc/multi_tensor_sgd_kernel.cu -o /vc_data/users/heyangqin/apex/build/temp.linux-x86_64-cpython-310/csrc/multi_tensor_sgd_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 -std=c++14

This pip compile commandline calls the nvcc in the conda env and it does not include the system CUDA dir /usr/local/cuda/include/ where the cusolverDn.h locates which causes the error. So I manually updated the PATH by export PATH=/usr/local/cuda/bin:$PATH and the error is gone. I wonder if this is the intended behavior?

1 Like

Your explanation sounds right and I don’t think nvcc is supposed to ship as a dependency and be usable for source build from the conda environment (at least we are not testing this use case).
Using the local CUDA toolkit is the right approach and we should try to filter out some packages for the next release.

1 Like

I ran into the same issue because nvcc has been packaged with pytorch in the latest release. In my case, the fix was not a trivial as updating the PATH variable. A package compiled with the local nvcc throws this error: provided PTX was compiled with an unsupported toolchain. From this link, it looks like the issue might be that driver version expected by the pytorch nvcc is different from the local nvcc. Unfortunately, I cannot update my driver and I still cannot figure out a real solution but I have a temporary workaround for my use case. Just leaving it here in case others run into this. Hopefully the next releases will revert back to not including nvcc.