Do pytorch containers come with CuDNN installed?

Hello,

I am running a docker container based on official pytorch/pytorch:1.7.1-cuda11.0-cudnn8-runtime,

I am also using onnxruntime-gpu package to serve the models from the container. However onnxruntime fails with

  File "/home/mrc/.local/lib/python3.8/site-packages/onnxruntime/__init__.py", line 24, in <module>
    from onnxruntime.capi._pybind_state import get_all_providers, get_available_providers, get_device, set_seed, \
  File "/home/mrc/.local/lib/python3.8/site-packages/onnxruntime/capi/_pybind_state.py", line 9, in <module>
    import onnxruntime.capi._ld_preload  # noqa: F401
  File "/home/mrc/.local/lib/python3.8/site-packages/onnxruntime/capi/_ld_preload.py", line 13, in <module>
    _libcudnn = CDLL("libcudnn.so.8", mode=RTLD_GLOBAL)
  File "/opt/conda/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcudnn.so.8: cannot open shared object file: No such file or directory

Inside the container I see the

root@fc13d70325fe:/# echo $LD_LIBRARY_PATH
/usr/local/nvidia/lib:/usr/local/nvidia/lib64

but there are no cudnn binaries in there.

Does anyone know what is causing the issue? Are the containers not coming pre-installed with cudnn, etc.?

Thank you,

S

1 Like

Based on the naming of the container it seems cudnn is installed and you could check the used version via print(torch.backends.cudnn.version()).
The error seems to be raised by onnxruntime and I don’t know how you’ve built/installed it and what might be the issue.

1 Like

Thanks @ptrblck !

Yeah, I see that

>>> print(torch.backends.cudnn.version())
8003

But I can’t find the lubcudnn binary anywhere in the container! :confused:

root@fc13d70325fe:/# find / -iname 'libcudnn*'
root@fc13d70325fe:/#

The other dependencies of ONNX runtime are there though

root@fc13d70325fe:/# find / -iname 'libcublas*'
/opt/conda/pkgs/cudatoolkit-11.0.221-h6bb024c_0/lib/libcublasLt.so.11.2.0.252
/opt/conda/pkgs/cudatoolkit-11.0.221-h6bb024c_0/lib/libcublasLt.so
/opt/conda/pkgs/cudatoolkit-11.0.221-h6bb024c_0/lib/libcublas.so.11.2.0.252
/opt/conda/pkgs/cudatoolkit-11.0.221-h6bb024c_0/lib/libcublas.so
/opt/conda/pkgs/cudatoolkit-11.0.221-h6bb024c_0/lib/libcublas.so.11
/opt/conda/pkgs/cudatoolkit-11.0.221-h6bb024c_0/lib/libcublasLt.so.11
/opt/conda/lib/libcublasLt.so.11.2.0.252
/opt/conda/lib/libcublasLt.so
/opt/conda/lib/libcublas.so.11.2.0.252
/opt/conda/lib/libcublas.so
/opt/conda/lib/libcublas.so.11
/opt/conda/lib/libcublasLt.so.11

So, where is the libcudnn binary that pytorch is using? :roll_eyes:

Edit: So I dug into the source code a bit, and it looks like pytorch has a completely separate implementation of cuDNN inside it’s own codebase. Is this true?

1 Like

No, PyTorch uses the official cudnn release and either links it dynamically or statically.

Note that you are using the runtime container, so nvcc isn’t installed either:

root@f79b17da2a55:/workspace# nvcc --version
bash: nvcc: command not found

Also, the lib path is also empty:

root@f79b17da2a55:/workspace# ls /usr/local/nvidia
ls: cannot access '/usr/local/nvidia': No such file or directory

If you want to build applications inside the container, use the devel container:

root@389363a6c5ec:/workspace# find /usr/ -name libcudnn.so
/usr/lib/x86_64-linux-gnu/libcudnn.so
root@389363a6c5ec:/workspace# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0
1 Like

Thanks @ptrblck , always helpful :slight_smile:

But isn’t it odd? The *-runtime package claims to have cudnn installed (and we see it through torch.backends) but it’s not actually there?

I don’t think I am actually compiling anything inside the container (but I could be wrong, maybe onnx does something special), I install onnxruntime-gpu through pip, and it fails during import when it tries to load libcudnn and cannot find it. I myself cannot find cudnn anywhere in the system, so pytorch must be doing something else here, no?

1 Like

It’s installed in the PyTorch binaries and is most likely linked statically.

This would mean that pnnxruntime-gpu doesn’t ship with its own statically linked cudnn, but is trying to dynamically link it from the system installation.

Yes, statically linking it and probably removing it afterwards to lower the size. If you need the local libs, you would have to use the devel container (or reinstall it into the runtime container).

1 Like

Alright, makes sense. Thank you @ptrblck !

Just to help anyone else ending up here searching for solutions, running:

sudo apt install libcudnn8

or whatever version you need, could help you.