Could not load library libcudnn_cnn_train.so.8 in new version

cinjon · October 28, 2023, 4:48am

I have a model that uses torchaudio.transforms.MelSpectrogram and torchaudio.models.Conformer. It works in torch==2.0.0, torchaudio==2.0.1, and torchdata==0.6.0. However, it does not work in the latest packages - torch==2.1.0 torchaudio==2.1.0 torchdata==0.7.0. This is an issue for me as I need stuff from the later packages.

The problem that arises is this error:

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn5infer22queryClusterPropertiesERPhS3_, version libcudnn_cnn_infer.so.8
Traceback (most recent call last):
...
  File "/venvs/bipu2/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
    torch.autograd.backward(
  File "/venvs/bipu2/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: GET was unable to find an engine to execute this computation

It comes up after running loss.backward() and so it’s only in train.

What’s going on?

ptrblck · October 28, 2023, 5:15pm

PyTorch ships with its own CUDA dependencies (including cuDNN) and the error message points to a locally installed cuDNN version. Either uninstall it as a workaround or remove it from the LD_LIBRARY_PATH to allow PyTorch to use its own version.

cinjon · October 30, 2023, 6:26pm

Cool, thanks, that seems to have worked. I appreciate the fast response.