Runtime error: "ImportError: /usr/lib64/libtorch_cuda.so: undefined symbol: cudnnSetDropoutDescriptor"

When I import torch, I get:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.11/site-packages/torch/__init__.py", line 367, in <module>
    from torch._C import *  # noqa: F403
    ^^^^^^^^^^^^^^^^^^^^^^
ImportError: /usr/lib64/libtorch_cuda.so: undefined symbol: cudnnSetDropoutDescriptor

This is a bit similar to #119072; cf. #139967

Versions

python collect_env.py

Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Slackware Linux  (x86_64)
GCC version: (GCC) 14.2.0
Clang version: 19.1.3
CMake version: version 3.30.5
Libc version: glibc-2.40

Python version: 3.11.10 (main, Sep  8 2024, 13:14:52) [GCC 14.2.0] (64-bit runtime)
Python platform: Linux-6.11.6-x86_64-AMD_Ryzen_Threadripper_2990WX_32-Core_Processor-with-glibc2.40
Is CUDA available: N/A
CUDA runtime version: 12.6.77
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: GPU 0: Quadro RTX 4000
Nvidia driver version: 560.35.03
cuDNN version: /usr/share/cuda/lib64/libcudnn.so.9.5.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        43 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               64
On-line CPU(s) list:                  0-63
Vendor ID:                            AuthenticAMD
Model name:                           AMD Ryzen Threadripper 2990WX 32-Core Processor
CPU family:                           23
Model:                                8
Thread(s) per core:                   2
Core(s) per socket:                   32
Socket(s):                            1
Stepping:                             2
Frequency boost:                      enabled
CPU(s) scaling MHz:                   73%
CPU max MHz:                          3000.0000
CPU min MHz:                          2200.0000
BogoMIPS:                             5988.12
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca sev sev_es
Virtualization:                       AMD-V
L1d cache:                            1 MiB (32 instances)
L1i cache:                            2 MiB (32 instances)
L2 cache:                             16 MiB (32 instances)
L3 cache:                             64 MiB (8 instances)
NUMA node(s):                         4
NUMA node0 CPU(s):                    0-7,32-39
NUMA node1 CPU(s):                    16-23,48-55
NUMA node2 CPU(s):                    8-15,40-47
NUMA node3 CPU(s):                    24-31,56-63
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Mitigation; untrained return thunk; SMT vulnerable
Vulnerability Spec rstack overflow:   Mitigation; Safe RET
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Retpolines; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] flake8==7.0.0
[pip3] numpy==1.26.3
[pip3] torch==2.5.0a0+gitunknown
[conda] Could not collect

It seems you’ve compiled from source based on torch==2.5.0a0+gitunknown and it’s unclear which commit you are using and if cuDNN was properly detected during your build.

CUDNN was detected:

--   USE_CUDA              : ON
--     Split CUDA          : 
--     CUDA static link    : OFF
--     USE_CUDNN           : ON
--     USE_CUSPARSELT      : OFF
--     USE_CUDSS           : OFF
--     USE_CUFILE          : OFF
--     CUDA version        : 12.6
--     USE_FLASH_ATTENTION : ON
--     USE_MEM_EFF_ATTENTION : ON
--     cuDNN version       : 9.5.1
--     CUDA root directory : /opt/cuda-12.6
--     CUDA library        : /usr/lib64/libcuda.so
--     cudart library      : /opt/cuda-12.6/lib/libcudart.so
--     cublas library      : /opt/cuda-12.6/lib64/libcublas.so
--     cufft library       : /opt/cuda-12.6/lib64/libcufft.so
--     curand library      : /opt/cuda-12.6/lib64/libcurand.so
--     cusparse library    : /opt/cuda-12.6/lib64/libcusparse.so
--     cuDNN library       : /usr/share/cuda/lib64
--     nvrtc               : /opt/cuda-12.6/lib/libnvrtc.so
--     CUDA include path   : /opt/cuda-12.6/include
--     NVCC executable     : /opt/cuda-12.6/bin/nvcc
--     CUDA compiler       : /opt/cuda-12.6/bin/nvcc
--     CUDA flags          :  -DLIBCUDACXX_ENABLE_SIMPLIFIED_COMPLEX_OPERATIONS -D_GLIBCXX_USE_CXX11_ABI=1 -Xfatbin -compress-all -DONNX_NAMESPACE=onnx_torch -gencode arch=compute_75,code=sm_75 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda  -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUB_WRAPPED_NAMESPACE=at_cuda_detail -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__
--     CUDA host compiler  : /opt/gcc-13.2/usr/bin/gcc
--     CUDA --device-c     : OFF
--     USE_TENSORRT        : 

Does it have to do with the fact that my CUDA install is in /opt and CUDNN is in the standard directory?

No, this shouldn’t be an issue. Which commit are you using?

I’m using https://github.com/pytorch/pytorch/releases/download/v2.5.1/pytorch-v2.5.1.tar.gz

Thank you!
I cannot reproduce the issue and can properly build torch==2.5.1 with CUDA 12.6 and cuDNN 9.5.1:

python -c "import torch; print(torch.__version__); print(torch.version.cuda); print(torch.backends.cudnn.version()); print(torch.cuda.is_available()); print(torch.randn(1).cuda())"
2.5.0a0+gita8d6afb
12.6
90501
True
tensor([0.9929], device='cuda:0')

Note that the commit points to v2.5.1.
cuDNN was also detected and is avialable:

...
--   USE_CUDA              : ON
--     Split CUDA          : 
--     CUDA static link    : OFF
--     USE_CUDNN           : ON
--     USE_CUSPARSELT      : ON
--     USE_CUDSS           : OFF
--     USE_CUFILE          : OFF
--     CUDA version        : 12.6
--     USE_FLASH_ATTENTION : ON
--     USE_MEM_EFF_ATTENTION : ON
--     cuDNN version       : 9.5.1
--     cuSPARSELt version  : 0.6.3
...

ldd /usr/lib64/libtorch_cuda.so reveals it’s not being linked to libcudnn.so at all, so this seems to be a build/linking issue, which is odd because I’m explicitly specifying:

     -DCUDNN_INCLUDE_DIR=/usr/share/cuda/include \
     -DCUDNN_LIBRARY=/usr/share/cuda/lib64/libcudnn.so \
     -DCUDNN_LIBRARY_PATH=/usr/share/cuda/lib64 \
     -DUSE_CUDA=ON \
     -DUSE_CUDNN=ON \

This is odd, too. It should say /usr/share/cuda/lib64/libcudnn.so, which is what I specified for CMake (-DCUDNN_LIBRARY=/usr/share/cuda/lib64/libcudnn.so).

Yes, this was the issue! I put cuDNN within /opt/cuda-12.6/, and ldd now shows that /usr/lib64/libtorch_cuda.so links to

libcudnn.so.9 => /opt/cuda-12.6/lib64/libcudnn.so.9 (0x00007f9b89800000)

So, the CMake setup doesn’t seem to work with cuDNN in non-standard directories.