Hey everyone. I have installed CUDA 11 + cudnn 8.2 globally on my machine, but I need to use exact Pytorch=1.4.0 for some repo to run, so I created an environment and installed:
conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.1 -c pytorch
When running some code in this environment I have some weird cudnn errors (E.g. RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
or CUBLAS_STATUS_EXECUTION_FAILED
). It seems the reason can be because of interference of local environment cudnn version and global cudnn. Result of my python -m torch.utils.collect_env
in the newly created environment:
Collecting environment information...
PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.1
OS: Ubuntu 20.04.2 LTS
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
CMake version: Could not collect
Python version: 3.8
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: GeForce RTX 3070
Nvidia driver version: 460.73.01
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.2.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.2.0
Versions of relevant libraries:
[pip3] numpy==1.20.1
[pip3] torch==1.4.0
[pip3] torchvision==0.5.0
[conda] blas 1.0 mkl
[conda] mkl 2021.2.0 h06a4308_296
[conda] mkl-service 2.3.0 py38h27cfd23_1
[conda] mkl_fft 1.3.0 py38h42c9631_2
[conda] mkl_random 1.2.1 py38ha9443f7_2
[conda] pytorch 1.4.0 py3.8_cuda10.1.243_cudnn7.6.3_0 pytorch
[conda] torchvision 0.5.0 py38_cu101 pytorch
As you can see pytorch is installed with cudnn7.6.3 but cudnn version used is 8.2.0. Is this the reason for cudnn errors or the reason is in the repo itself?