Could not load library libcudnn_cnn_train.so.8 in new version

I have a model that uses torchaudio.transforms.MelSpectrogram and torchaudio.models.Conformer. It works in torch==2.0.0, torchaudio==2.0.1, and torchdata==0.6.0. However, it does not work in the latest packages - torch==2.1.0 torchaudio==2.1.0 torchdata==0.7.0. This is an issue for me as I need stuff from the later packages.

The problem that arises is this error:

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn5infer22queryClusterPropertiesERPhS3_, version libcudnn_cnn_infer.so.8
Traceback (most recent call last):
...
  File "/venvs/bipu2/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
    torch.autograd.backward(
  File "/venvs/bipu2/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: GET was unable to find an engine to execute this computation

It comes up after running loss.backward() and so it’s only in train.

What’s going on?

PyTorch ships with its own CUDA dependencies (including cuDNN) and the error message points to a locally installed cuDNN version. Either uninstall it as a workaround or remove it from the LD_LIBRARY_PATH to allow PyTorch to use its own version.

1 Like

Cool, thanks, that seems to have worked. I appreciate the fast response.