RNN giving CUDNN_STATUS_SUCCESS Error

I have a RNN model that throws this strange exception. I am using PyTorch 0.4 since this is an old code and I am still trying to upgrade it (still would really like to have it running for comparison).

I have CUDA 10.1 installed and it seems only the LSTM based model is causing issues. Any help would be highly appreciated.

    self.lang_model.cuda()
  File "/home/chinmay/Desktop/setup/3dsis/lib/python3.6/site-packages/torch/nn/modules/module.py", line 258, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/chinmay/Desktop/setup/3dsis/lib/python3.6/site-packages/torch/nn/modules/module.py", line 185, in _apply
    module._apply(fn)
  File "/home/chinmay/Desktop/setup/3dsis/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 112, in _apply
    self.flatten_parameters()
  File "/home/chinmay/Desktop/setup/3dsis/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 105, in flatten_parameters
    self.batch_first, bool(self.bidirectional))
RuntimeError: CuDNN error: CUDNN_STATUS_SUCCESS

Could you update to the latest stable PyTorch version (1.5.1) and post a code snippet to reproduce this issue, please?

Hi @ptrblck the issue does not happen with the latest version of PyTorch it is just that I have an older code and I am still working on porting it. However, interestingly, I found an old issue which suggests that we should try and put the model on CUDA twice and I was able to get it to work (5 minutes before your message :slightly_smiling_face: ).

Hi, my problem is similar to the above post. Specifically, my GPU server environment is centOS Linux 7.8.2003(core),and pytorch0.4.1, two cuda version 9.0 and 10.1(and 10.1 is the version I use). And I’m reproducing others program( that is cloned from GitHub) now, When I run the program, the system throw some exception as the following,

Thanks for any help!

Could you update to the latest PyTorch release (1.10.0) as 0.4.1 was released in July 2018 and is quite old by now?

Because the program I want to reproduce requires pytorch0.4.1 and other environment requirements! So I have no choice!

In that case you could disable cuDNN via torch.backends.cudnn.enabled = False and see if your code could run.

You mean I need set torch.backends.cudnn.enabled = False, which file that I need to add this line in? or I need modify some global config file, such as my .bashrc file, etc ? Could you give me some suggestion! Thank you very much, I’ll try it now!

You can add it into your main script after importing torch.

@bibo I honestly think catching the exception and putting the model on Cuda twice is perhaps the easiest option. Did it not work for you?

OK, I 'll try it now! Thanks a lot!

Thank you! And I’d like to know how can I putting the model on Cuda twice. Could you give me some available steps?

This is what I used:-

# Solving error by trying to put on Cuda twice
try:
     self.lang_model.cuda()
except:
     self.lang_model.cuda()

Where can I add these lines? Which file should add these code? Thank you!

Can you please paste the snippet of your main.py file or any other file you are using for the initial setup?

Thanks for your help! The program that I reproduce is from GitHub(GitHub - zhangj111/rencos). The main.py is run.py at(GitHub - zhangj111/rencos). Its content as follows:
import os
import sys
import time

def main(opt, mode=2):
if opt == ‘preprocess’:
command = “python preprocess.py -train_src samples/%s/train/train.spl.src
-train_tgt samples/%s/train/train.txt.tgt
-valid_src samples/%s/valid/valid.spl.src
-valid_tgt samples/%s/valid/valid.txt.tgt
-save_data samples/%s/preprocessed/baseline_spl
-src_seq_length 10000
-tgt_seq_length 10000
-src_seq_length_trunc %d
-tgt_seq_length_trunc %d” % (lang, lang, lang, lang, lang, src_len, tgt_len)
os.system(command)
elif opt == ‘train’:
command = “python train.py -word_vec_size 256
-layers 1
-rnn_size 512
-rnn_type LSTM
-global_attention mlp
-data samples/%s/preprocessed/baseline_spl
-save_model models/%s/baseline_spl
-gpu_ranks 0
-batch_size 32
-optim adam
-learning_rate 0.001
-dropout 0
-encoder_type brnn” % (lang, lang)
os.system(command)
elif opt == ‘retrieval’:
print(‘Syntactic level…’)
command1 = “python syntax.py %s” % lang
os.system(command1)
print(‘Semantic level…’)
batch_size = 32 if lang == ‘python’ else 16
command2 = “python translate.py -model models/%s/baseline_spl_step_100000.pt
-src samples/%s/train/train.spl.src
-output samples/%s/output/test.out
-batch_size %d
-gpu 0
-fast
-max_sent_length %d
-refer 0
-lang %s
-search 2” % (lang, lang, lang, batch_size, src_len, lang)
os.system(command2)
command3 = “python translate.py -model models/%s/baseline_spl_step_100000.pt
-src samples/%s/test/test.spl.src
-output samples/%s/test/test.ref.src.1
-batch_size 32
-gpu 0
-fast
-max_sent_length %d
-refer 0
-lang %s
-search 2” % (lang, lang, lang, src_len, lang)
os.system(command3)
print(‘Normalize…’)
command4 = “python normalize.py %s” % lang
os.system(command4)
elif opt == ‘translate’:
command = “python translate.py -model models/%s/baseline_spl_step_100000.pt
-src samples/%s/test/test.spl.src
-output samples/%s/output/test.out
-min_length 3
-max_length %d
-batch_size 32
-gpu 0
-fast
-max_sent_length %d
-refer %d
-lang %s
-beam 5” % (lang, lang, lang, tgt_len, src_len, mode, lang)
os.system(command)
print(‘Done.’)

if name == ‘main’:
option = sys.argv[1]
lang = sys.argv[2]
assert option in [‘preprocess’, ‘train’, ‘retrieval’, ‘translate’, ‘all’]
assert lang in [‘python’, ‘java’]
if lang == ‘python’:
src_len, tgt_len = 100, 50
elif lang == ‘java’:
src_len, tgt_len = 300, 30
else:
print(“Unsupported Programming Language:”, lang)
if option == ‘all’:
main(‘preprocess’)
main(‘train’)
main(‘retrieval’)
main(‘translate’)
else:
if option == ‘translate’:
mode = int(sys.argv[3])
main(option, mode)
else:
main(option)

Hi, My problem that I posted before have been solved, thank you for your help! And now I met another problem when I translate and generate output, it is “Runtime error; CUDA error: out of memory”. I think that it may be caused by the small amount of the monitor memory of our GPU server, so I want to reset my batch-size so that it can be suitable to the monitor memory of our GPU server, but I don’t know which file I need to choose to.

Based on your previously posted code you are setting the batch size via:

-batch_size 32 

so could reduce this value until the memory usage meets the available GPU memory.

OK, you mean that I can modify run.py and reset batch size! Are there any other python file need to modify simultaneously? Thanks! I’ll try it!