Pytorch compatibile with NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2

Hello
i’m trying to to run as pytorch-gpu the follow transfer learning code (Transfer Learning for Computer Vision Tutorial — PyTorch Tutorials 2.1.0+cu121 documentation)
in my Ubuntu 20.04 LTS with the the follow nvidia version (see image attached)
nvidia

but I encontered several incompatibilities :

First, the as is cuda version not load the dynamic library libcudnn.so.8. When I tried to install previous pythorch and associated cuda versions the torch library give me other errors.

Can you suggest me a proper associations between torch and my nvidia workstation?..Thanks in advance!

Driver 535.104.05 is compatible with every stable and nightly binary we are building.

I don’t understand this statement since the PyTorch binaries ship with their own CUDA dependencies (including cuDNN and NCCL etc.) and load these. Your locally installed CUDA toolkit won’t be used so if you want to use a custom setup you should build PyTorch from source.

1 Like

Hy,

Sorry for unclearity in my question ( I’m still a novice with torch…), I’ll try to explain now better :

The error appears only when I run the training in the previous mentioned code ( the necessary libraries were all correctly loaded). Here the full msn :

“Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8”

In particular, I dont’ understand about this error in particular ( maybe it come from the code?)
" undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb"

The system path points to a cuDNN installation in /usr so either remove this cuDNN version completely and let PyTorch use its own cuDNN libs or remove this path from LD_LIBRARY_PATH as a workaround.

1 Like

I see many thanks again now code runs properly