CUDNN_STATUS_EXECUTION_FAILED error runtime

Traceback (most recent call last):
File “main.py”, line 46, in
main(config, args)
File “main.py”, line 19, in main
Mission.train()
File “/opt/notebooks/TextZoom/src/interfaces/super_resolution.py”, line 33, in train
model_dict = self.generator_init()
File “/opt/notebooks/TextZoom/src/interfaces/base.py”, line 149, in generator_init
model = model.to(self.device)
File “/root/.conda/envs/textZoom/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 381, in to
return self._apply(convert)
File “/root/.conda/envs/textZoom/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 187, in _apply
module._apply(fn)
File “/root/.conda/envs/textZoom/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 187, in _apply
module._apply(fn)
File “/root/.conda/envs/textZoom/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 187, in _apply
module._apply(fn)
File “/root/.conda/envs/textZoom/lib/python3.6/site-packages/torch/nn/modules/rnn.py”, line 117, in _apply
self.flatten_parameters()
File “/root/.conda/envs/textZoom/lib/python3.6/site-packages/torch/nn/modules/rnn.py”, line 113, in flatten_parameters
self.batch_first, bool(self.bidirectional))
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

I’m facing this problem. Do you have any suggestion?

See if this helps.

https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html

WARNING

There are known non-determinism issues for RNN functions on some versions of cuDNN and CUDA. You can enforce deterministic behavior by setting the following environment variables:

On CUDA 10.1, set environment variable CUDA_LAUNCH_BLOCKING=1. This may affect performance.

On CUDA 10.2 or later, set environment variable (note the leading colon symbol) CUBLAS_WORKSPACE_CONFIG=:16:8 or CUBLAS_WORKSPACE_CONFIG=:4096:2.

See the cuDNN 8 Release Notes for more information.

Could you post an executable code snippet to reproduce this issue (model definition and input shape information might be enough) as well as information about your setup (PyTorch, CUDA, cudnn version, used GPU)?

I think that I have probably found the error because I am currently using CUDA version 11.2 with PyTorch version 1.7.1

Did you build PyTorch from source using this setup? If so, which cudnn version were you using?
A code snippet to reproduce this issue would still be needed in order to debug and fix it.