Why I cannot use cudann on RTX2080Ti but GTX1080Ti can

LuoXin-s · March 25, 2021, 10:06am

Here is my environment.

OS: Ubuntu 16.04.7 LTS (x86_64)
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
Clang version: Could not collect
CMake version: Could not collect

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: 10.2.89
GPU models and configuration: GPU 0: GeForce RTX 2080 Ti
Nvidia driver version: 440.82
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] pytorch-lightning==1.2.4
[pip3] torch==1.8.0
[pip3] torchaudio==0.8.0
[pip3] torchvision==0.9.0
[conda] numpy 1.19.5 pypi_0 pypi
[conda] pytorch-lightning 1.2.4 pypi_0 pypi
[conda] torch 1.8.0 pypi_0 pypi
[conda] torchaudio 0.8.0 pypi_0 pypi
[conda] torchvision 0.9.0 pypi_0 pypi

report RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
and torch.backends.cudnn.enabled return True.

Below is full traceback :

Traceback (most recent call last):
  File "/ghome/luoxin/projects/liif-lightning-hydra/run.py", line 38, in main
    return test(config)
  File "/ghome/luoxin/projects/liif-lightning-hydra/src/test.py", line 63, in test
    trainer.test(model=trained_model, datamodule=datamodule)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 914, in test
    results = self.__test_given_model(model, test_dataloaders)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 972, in __test_given_model
    results = self.fit(model)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 498, in fit
    self.dispatch()
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 539, in dispatch
    self.accelerator.start_testing(self)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 76, in start_testing
    self.training_type_plugin.start_testing(trainer)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 118, in start_testing
    self._results = trainer.run_test()
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 785, in run_test
    eval_loop_results, _ = self.run_evaluation()
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 724, in run_evaluation
    output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 160, in evaluation_step
    output = self.trainer.accelerator.test_step(args)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 195, in test_step
    return self.training_type_plugin.test_step(*args)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 134, in test_step
    return self.lightning_module.test_step(*args, **kwargs)
  File "/ghome/luoxin/projects/liif-lightning-hydra/src/models/liif.py", line 183, in test_step
    preds = super_resolution(model=self, x=lr.unsqueeze(0), target_resolution=gt_size, bsize=30000)
  File "/ghome/luoxin/projects/liif-lightning-hydra/src/super_resolution.py", line 11, in super_resolution
    feature = model.encoder(x)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/ghome/luoxin/projects/liif-lightning-hydra/src/architectures/edsr.py", line 68, in forward
    x = self.head(x)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/container.py", line 119, in forward
    input = module(input)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 399, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 395, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

ptrblck · March 25, 2021, 10:30am

You are most likely running into this issue.
As a workaround you could install the nightly binaries, pip wheels with CUDA11.1, or any conda binary.

LuoXin-s · March 25, 2021, 11:29am

There are any other scheme figure out this issues? I have to work on CUDA10.2 and cuDNN7.6.5 now.

ptrblck · March 25, 2021, 9:53pm

Yes, you can use the 1.8.1 pip wheels, which were released today.