Not able to include cusolverDn.h

I think this may be related to the recent change in the conda packages. I just created a new conda environment, installed pytorch according to the official documentation(conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia) and tried to compile apex from source. The pip commandline looks like this:

/vc_data/users/heyangqin/anaconda3/envs/deepspeed/bin/nvcc  -I/vc_data/users/heyangqin/anaconda3/envs/deepspeed/lib/python3.10/site-packages/torch/include -I/vc_data/users/heyangqin/anaconda3/envs/deepspeed/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/vc_data/users/heyangqin/anaconda3/envs/deepspeed/lib/python3.10/site-packages/torch/include/TH -I/vc_data/users/heyangqin/anaconda3/envs/deepspeed/lib/python3.10/site-packages/torch/include/THC -I/vc_data/users/heyangqin/anaconda3/envs/deepspeed/include -I/vc_data/users/heyangqin/anaconda3/envs/deepspeed/include/python3.10 -c -c /vc_data/users/heyangqin/apex/csrc/multi_tensor_sgd_kernel.cu -o /vc_data/users/heyangqin/apex/build/temp.linux-x86_64-cpython-310/csrc/multi_tensor_sgd_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 -std=c++14

This pip compile commandline calls the nvcc in the conda env and it does not include the system CUDA dir /usr/local/cuda/include/ where the cusolverDn.h locates which causes the error. So I manually updated the PATH by export PATH=/usr/local/cuda/bin:$PATH and the error is gone. I wonder if this is the intended behavior?

1 Like