Pytorch compatibile with NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2

Rmelis80 · October 19, 2023, 1:50pm

Hello
i’m trying to to run as pytorch-gpu the follow transfer learning code (Transfer Learning for Computer Vision Tutorial — PyTorch Tutorials 2.2.0+cu121 documentation)
in my Ubuntu 20.04 LTS with the the follow nvidia version (see image attached)
nvidia

but I encontered several incompatibilities :

First, the as is cuda version not load the dynamic library libcudnn.so.8. When I tried to install previous pythorch and associated cuda versions the torch library give me other errors.

Can you suggest me a proper associations between torch and my nvidia workstation?..Thanks in advance!

ptrblck · October 19, 2023, 2:26pm

Driver 535.104.05 is compatible with every stable and nightly binary we are building.

I don’t understand this statement since the PyTorch binaries ship with their own CUDA dependencies (including cuDNN and NCCL etc.) and load these. Your locally installed CUDA toolkit won’t be used so if you want to use a custom setup you should build PyTorch from source.

Rmelis80 · October 20, 2023, 7:31am

Hy,

Sorry for unclearity in my question ( I’m still a novice with torch…), I’ll try to explain now better :

The error appears only when I run the training in the previous mentioned code ( the necessary libraries were all correctly loaded). Here the full msn :

“Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8”

In particular, I dont’ understand about this error in particular ( maybe it come from the code?)
" undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb"

ptrblck · October 20, 2023, 12:55pm

The system path points to a cuDNN installation in /usr so either remove this cuDNN version completely and let PyTorch use its own cuDNN libs or remove this path from LD_LIBRARY_PATH as a workaround.

Rmelis80 · October 23, 2023, 7:16am

I see many thanks again now code runs properly