PyTorch cuDNN version mismatch: PyTorch was compiled against 7104 but linked against 7005

hasakii · October 19, 2018, 10:46am

PyTorch remains so many bugs, feeling tired to use this framework…

So, the problem is, I compiled pytorch from source, and set CUDNN to my own path,
there is nowhere exist another CUDNN or CUDA.

In the building log, cuDNN version : 7.1.4, but when I run lstm, it given following error:
RuntimeError: cuDNN version mismatch: PyTorch was compiled against 7104 but linked against 7005.

So, can anyone help me ??

hasakii · October 19, 2018, 10:51am

and, after build finished, I get the following warning:
warning: no library file corresponding to '/mnt/lustre/sjtu/users/mkh96/tools/cuda-9.0/lib64/libcudnn.so' found (skipping)

albanD · October 19, 2018, 11:03am

Hi,

Could you provide the following informations please:

Where is your cuda install? Is there both one global and one local? like /usr/local/cuda and /your/loca/cuda/ ?
What is the content of the lib64 folder in it (or each install if you have more than one). Especially all the ls -la libcudnn* files?
What is the result of echo $LD_LIBRARY_PATH ?

hasakii · October 19, 2018, 11:48am

Thanks for your advise. I checked all the environment path, I’m pretty sure that I used the same CUDNN dir which is version 7.1.4 and I also build other frameworks with this path.

Following is the warning log I found:

CMake Warning at cmake/Modules_CUDA_fix/upstream/FindCUDA.cmake:1836 (add_library):
Cannot generate a safe runtime search path for target caffe2_gpu because
files in some directories may conflict with libraries in implicit 
directories: 

runtime library [libcudnn.so.7] in /mnt/lustre/sjtu/users/mkh96/tools/cuda-9.0/lib64 may be hidden by files in: 

/users/xxx/miniconda3/envs/pytorch/lib

Some of these libraries may not be found correctly.
Call Stack (most recent call first):
cmake/public/utils.cmake:134 (cuda_add_library) 
caffe2/CMakeLists.txt:227 (torch_cuda_based_add_library)
-- Generating done

CMake Warning:
 Manually-specified variables were not used by the project:   
CUDNN_LIB_DIR

Above was part of the problem building log, /users/xxx/miniconda3/envs/pytorch/lib is my pytorch conda dir.
I use this python to compile pytorch source, but for Gods sake, I DO set CUDNN_LIB_DIR to another dir, because I have mxnet, tensorflow, kaldi, and so many frameworkds to use, all of them share a common environment path.

So, it’s my fault, I didn’t check the log carefully, I should not skip any warnings.

But I still want to mention: Why not use user set environment but search a confused dir ?
User set environment should be the first order, but in the log, the program didn’t take it.
This makes us can not trust your framework, we don’t know when or how some errors or bugs occur, and we would be really confused about those things, waste so many time.

I hope you considering it.

hasakii · October 19, 2018, 11:54am

By the way, I set export PATH=$HOME/miniconda3/envs/pytorch/bin:$PATH before run python setup.py install

albanD · October 19, 2018, 12:57pm

Hi,

If this folder is ahead in your LD_LIBRARY path, that means that the conda env is active right?
Also you’re not supposed to add conda env bin to the path by hand ? I don’t use it so I’m not sure. But from what I remember, activating the env would do it for you (and handle the libraries properly as well).

The CUDNN_LIB_DIR is used at compile time, this is why you get pytorch compiled for cudnn 7.1. At runtime, the cudnn binary is loaded from shared library to reduce binary size and more flexibility. The warning printed at compile time is here to notify you if the cudnn that is going to be loaded (according to LD_LIBRARY_PATH) is the same as the one used for compilation.

hasakii · October 20, 2018, 1:28am

Thank you very much !

Zhen_Cao · September 6, 2019, 3:18am

Hi albanD.

I encountered a similar issue “RuntimeError: cuDNN version mismatch: PyTorch was compiled against 7102 but linked against 7600”

How I understand this problem is that my code is compiled for cudnn 7.1.2 but try to run with cudnn 7.6.0. (Pls correct me if I am wrong) from your discussion.

I still haven’t solve this problem yet after referring to your discussion.
May I know more background knowledge like why you ask that three quesitons?

For my case:
I try to run my code under conda vitural env in a Ubuntu docker container.
In addition, I checked I have “/usr/local/cuda” but I am not sure if I also have in my local ptath(how can i check?). And I got both “cuda” and “cuda-9.0” in the folder “/usr/local/”. What’s this mean?

Content of the /usr/local/cuda/lib64 folder: “ls: cannot access ‘libcudnn*’: No such file or directory”. The same to cuda-9.0.

echo $LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64

Thank you.

Zhen_Cao · September 9, 2019, 9:38am

Hi Hasakii,

How can you find the building log?

Thx

albanD · September 9, 2019, 10:12pm

Hi,

I am not a conda expert but I think conda cudnn package is where you get your cudnn? Make sure that whatever this path is is properly linked to in the env variables? Also you can use “locate libcudnn.so” to find all the versions of cudnn on your system. Finally make sure to properly uninstall pytorch before installing from source as a binary install with an old cudnn might be hiding your source install.

tyoc213 · February 22, 2021, 6:59am

ALso you can search for old installations like this find . -name libcudnn* 2>/dev/null

For example I was having results for

find . -name libcudnn*8.0.5* 2>/dev/null
find . -name libcudnn*8.1.0* 2>/dev/null

Then I just rm -R /usr/local/cuda-11.1.0 and let 11.2.0 and it found the correct library.