torch C++ 编译找不到 cudnn

smartadpole · August 30, 2020, 2:26am

ubuntu 18, pytorch 是 pip安装的，cuda10.2 cudnn8.0

cmake 成功了，找到了 cudnn：

-- Caffe2: CUDA detected: 10.2
-- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda
-- Caffe2: Header version is: 10.2
-- Found cuDNN: v8.0.0  (include: /usr/include, library: /usr/lib/aarch64-linux-gnu/libcudnn.so)
-- Autodetected CUDA architecture(s):  5.3
-- Added CUDA NVCC flags for: -gencode;arch=compute_53,code=sm_53
-- Configuring done
-- Generating done

make 报错：

libtorch.so: undefined reference to `cudnnGetConvolutionBackwardDataAlgorithm@libcudnn.so.8

为什么

Lin_Jia · August 30, 2020, 4:19am

It seems that you are trying to build libtorch using cmake, the error means that the cpp file related to the function cannot be found. One way to work around this is to download the distributed binary: from https://pytorch.org/, pick C++ language.

WMF1997 · August 30, 2020, 7:41am

您好,
从 /usr/lib/aarch64-linux-gnu/ 再加上CUDA和CUDNN可以看出, 您并未使用标准的x86_64架构的计算机来进行编译, 您很有可能是使用NVIDIA Jetson TX2?来源码编译的. (肯定不是用诸如KVM的硬件虚拟技术来做的, 硬件虚拟技术到不了虚拟指定显卡的层级)

上面那个人说的原因是没有问题, 的确是有cpp文件找不到, 但是解决方法也肯定不是下载官方的那个libtorch(官方的libtorch还是x86_64架构的编译系统编译出来的…)

所以, … 如果您不着急的话, 一个很简单的方法就是去掉CUDNN, 然后仅仅使用CUDA编译. (如果我没有记错的话, CUDNN 好像不是必须项?是可选项?)

如果仅有CUDA的内容无误之后… (至少有个CUDA的作为基础) 再来看也不迟. (具体的细节我只知道这些)

ptrblck · August 30, 2020, 9:04am

Hi Nabla,

could you use an online translator to translate your post to the English version please, so that other users could help you?

From Google translate:

Hello,
From /usr/lib/aarch64-linux-gnu/ plus CUDA and CUDNN, you can see that you did not use a standard x86_64 architecture computer to compile, you are likely to use NVIDIA Jetson TX2? source code to compile (Definitely not done with hardware virtualization technology such as KVM, hardware virtualization technology cannot reach the level of virtual designated graphics card)
The reason mentioned by the person above is that there is no problem. There is indeed a cpp file that cannot be found, but the solution is definitely not to download the official libtorch (the official libtorch is compiled by the x86_64 architecture compilation system…)
So,… if you are not in a hurry, a very simple way is to remove CUDNN and just compile with CUDA. (If I remember correctly, CUDNN does not seem to be required? Is it optional?)
If only the content of CUDA is correct… (at least there is a CUDA as a basis) It is not too late to look at it. (I only know the specific details)

WMF1997 · August 30, 2020, 2:24pm

emmm… sorry for that, I just want to reply the answer… (since i found that the question by @smartadpole is written in chinese. I will use english next time.

here is my reply to @smartadpole and @Lin_Jia, I translated by myself, in english.

hello @smartadpole
from /usr/lib/aarch64-linux-gnu, and CUDA/CUDNN, i can find that you are not using computers with x86_64 architecture to compile from source, and you may use NVIDIA Jetson TX2 to compile from source. (probably not use KVM to virtualize the hardware, since NVIDIA Graphic Cards cannot be virtualized)

the reply from @Lin_Jia:
the reason is right, since .cpp cannot be found. BUT, the solution is not downloading libtorch which is officially provided in pytorch.org , which is also precompiled in x86_64 architecture operating systems, which does not fit ARM-based devices.

So… one solution is that compiling again, with CUDNN excluded. (Just use CUDA is okay) At least, pytorch with cuda can also be accelerated.

yours sincerely
@wmf1997

Yannis · December 3, 2022, 3:03pm

麻烦您再具体说一下如何不采用cudnn.so对libtorch进行编译吗？我出现了题主类似的问题，希望得到解答，感谢！