Pytorch uses global cudnn version instead of environment version

Misterion777 · May 10, 2021, 8:49am

Hey everyone. I have installed CUDA 11 + cudnn 8.2 globally on my machine, but I need to use exact Pytorch=1.4.0 for some repo to run, so I created an environment and installed:

conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.1 -c pytorch

When running some code in this environment I have some weird cudnn errors (E.g. RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED or CUBLAS_STATUS_EXECUTION_FAILED). It seems the reason can be because of interference of local environment cudnn version and global cudnn. Result of my python -m torch.utils.collect_env in the newly created environment:

Collecting environment information...
PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Ubuntu 20.04.2 LTS
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
CMake version: Could not collect

Python version: 3.8
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: GeForce RTX 3070
Nvidia driver version: 460.73.01
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.2.0

Versions of relevant libraries:
[pip3] numpy==1.20.1
[pip3] torch==1.4.0
[pip3] torchvision==0.5.0
[conda] blas                      1.0                         mkl  
[conda] mkl                       2021.2.0           h06a4308_296  
[conda] mkl-service               2.3.0            py38h27cfd23_1  
[conda] mkl_fft                   1.3.0            py38h42c9631_2  
[conda] mkl_random                1.2.1            py38ha9443f7_2  
[conda] pytorch                   1.4.0           py3.8_cuda10.1.243_cudnn7.6.3_0    pytorch
[conda] torchvision               0.5.0                py38_cu101    pytorch

As you can see pytorch is installed with cudnn7.6.3 but cudnn version used is 8.2.0. Is this the reason for cudnn errors or the reason is in the repo itself?

ptrblck · May 10, 2021, 8:55am

The global cudnn and CUDA installations won’t be used, if you install the binaries.
However, since you are using an Ampere GPU, you would need to use CUDA>=11.0 with cudnn>=8, which aren’t used in the PyTorch 1.4.0 binaries.

Misterion777 · May 10, 2021, 9:15am

I see, thanks for the quick answer. Just to make sure, it is possible to build Pytorch 1.4.0 with latest CUDA from source, right?

ptrblck · May 10, 2021, 9:17am

No, I don’t think it will work directly, since a few changes were needed to be able to build with CUDA>=11.0 and cudnn>=8, so you could run into build issues.

Misterion777 · May 10, 2021, 9:19am

Got it, will try to solve my issue with repo itself, thank you very much!