:RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

[ Fri Feb 28 20:05:08 2025 ] Parameters:
{‘work_dir’: ‘./work_dir/baseline_res18/’, ‘config’: ‘./configs/baseline.yaml’, ‘random_fix’: True, ‘device’: ‘0’, ‘phase’: ‘train’, ‘save_interval’: 5, ‘random_seed’: 0, ‘eval_interval’: 1, ‘print_log’: True, ‘log_interval’: 10000, ‘evaluate_tool’: ‘sclite’, ‘feeder’: ‘dataset.dataloader_video.BaseFeeder’, ‘dataset’: ‘phoenix2014-T’, ‘dataset_info’: {‘dataset_root’: ‘./dataset/phoenix2014-T’, ‘dict_path’: ‘./preprocess/phoenix2014-T/gloss_dict.npy’, ‘evaluation_dir’: ‘./evaluation/slr_eval’, ‘evaluation_prefix’: ‘phoenix2014-T-groundtruth’}, ‘num_worker’: 5, ‘feeder_args’: {‘mode’: ‘test’, ‘datatype’: ‘video’, ‘num_gloss’: -1, ‘drop_ratio’: 1.0, ‘frame_interval’: 1, ‘image_scale’: 1.0, ‘input_size’: 224, ‘prefix’: ‘./dataset/phoenix2014-T’, ‘transform_mode’: False}, ‘model’: ‘slr_network.SLRModel’, ‘model_args’: {‘num_classes’: 1116, ‘c2d_type’: ‘resnet18’, ‘conv_type’: 2, ‘use_bn’: 1, ‘share_classifier’: True, ‘weight_norm’: True}, ‘load_weights’: None, ‘load_checkpoints’: None, ‘decode_mode’: ‘beam’, ‘ignore_weights’: , ‘batch_size’: 1, ‘test_batch_size’: 1, ‘loss_weights’: {‘SeqCTC’: 1.0, ‘ConvCTC’: 1.0, ‘Dist’: 25.0}, ‘optimizer_args’: {‘optimizer’: ‘Adam’, ‘base_lr’: 0.0001, ‘step’: [20, 35], ‘learning_ratio’: 1, ‘weight_decay’: 0.0001, ‘start_epoch’: 0, ‘nesterov’: False}, ‘num_epoch’: 40}

0%| | 0/7096 [00:01<?, ?it/s]
Traceback (most recent call last):
File “/media/mohan/mohan/CorrNet/main.py”, line 255, in
processor.start()
File “/media/mohan/mohan/CorrNet/main.py”, line 67, in start
seq_train(self.data_loader[‘train’], self.model, self.optimizer,
File “/media/mohan/mohan/CorrNet/seq_scripts.py”, line 28, in seq_train
loss = model.criterion_calculation(ret_dict, label, label_lgt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/media/mohan/mohan/CorrNet/slr_network.py”, line 122, in criterion_calculation
loss += weight * self.loss[‘CTCLoss’](ret_dict[“sequence_logits”].log_softmax(-1),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/media/mohan/anaconda3/envs/corrnet/lib/python3.12/site-packages/torch/nn/modules/module.py”, line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/media/mohan/anaconda3/envs/corrnet/lib/python3.12/site-packages/torch/nn/modules/module.py”, line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/media/mohan/anaconda3/envs/corrnet/lib/python3.12/site-packages/torch/nn/modules/loss.py”, line 1980, in forward
return F.ctc_loss(
^^^^^^^^^^^
File “/media/mohan/anaconda3/envs/corrnet/lib/python3.12/site-packages/torch/nn/functional.py”, line 3069, in ctc_loss
return torch.ctc_loss(
^^^^^^^^^^^^^^^
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

If your code still fails in the latest nightly or stable release, could you post a minimal and executable code snippet, please?

(corrnet) mohan@debanga-Precision-7820-Tower:/media/mohan/mohan/CorrNet$ python
Python 3.12.9 | packaged by Anaconda, Inc. | (main, Feb 6 2025, 18:56:27) [GCC 11.2.0] on linux
Type “help”, “copyright”, “credits” or “license” for more information.

import torch

print(“Torch version:”, torch.version)
Torch version: 2.5.1+cu121
print(“CUDA version:”, torch.version.cuda)
CUDA version: 12.1
print(“cuDNN version:”, torch.backends.cudnn.version())
cuDNN version: 90701
exit(0
… )
(corrnet) mohan@debanga-Precision-7820-Tower:/media/mohan/mohan/CorrNet$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

I am currently working on this repo

the following is the code of slr_network.py

the following is my main.py file: