cuDNN error with simple code


(Yoohv Zo) #1

My information:

system: ubuntu:18.04
python: 3.7.1
torch: 1.0.1.post2

Code:

import torch
import torch.nn as nn


class RNN_ENCODER(nn.Module):
    def __init__(self, num_embeddings, input_size=300, drop_prob=0.5,
                 hidden_size=256, layers=1, bidirectional=True):
        super().__init__()
        self.encoder = nn.Embedding(num_embeddings, input_size)
        self.drop = nn.Dropout(drop_prob)
        if bidirectional:
            hidden_size //= 2
        drop_prob = 0 if layers == 1 else drop_prob
        self.rnn = nn.LSTM(input_size, hidden_size, layers, batch_first=True,
                           dropout=drop_prob, bidirectional=bidirectional)
        # self.encoder.weight.data.uniform_(-0.1, 0.1)

    def forward(self, captions, hidden):
        emb = self.drop(self.encoder(captions))
        output, _ = self.rnn(emb, hidden)
        return output


if __name__ == '__main__':
    rnn = RNN_ENCODER(num_embeddings=100).cuda()

error:

Traceback (most recent call last):
  File "src/Models/TextEncoder.py", line 25, in <module>
    rnn = RNN_ENCODER(num_embeddings=100).cuda()
  File "/home/zyoohv/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 260, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/zyoohv/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 187, in _apply
    module._apply(fn)
  File "/home/zyoohv/anaconda3/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 117, in _apply
    self.flatten_parameters()
  File "/home/zyoohv/anaconda3/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 113, in flatten_parameters
    self.batch_first, bool(self.bidirectional))
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

#2

Could you try to disable cuDNN using torch.backends.cudnn.enabled = False and run your code again?
The code you’ve posted here runs fine on my machine, so the error might come from some setup issue on your machine or the way you are using your model.
Here is a small code snippet which is working:

x = torch.randint(0, 100, (1, 1)).to('cuda')
hidden = torch.randn(2, 1, 128).to('cuda')
state = torch.randn(2, 1, 128).to('cuda')
output = rnn(x, (hidden, state))

(Yoohv Zo) #3

very thanks!!!
It’s work for me by using torch.backends.cudnn.enabled = False.


#4

Good to hear, but that’s not really a solution.
Could you post which PyTorch, CUDA and cuDNN versions you are using:

print(torch.__version__)
print(torch.version.cuda)
print(torch.backends.cudnn.version())

so that I could try to reproduce this issue?


(Yoohv Zo) #5

The output of the codes are:

1.0.1.post2
9.0.176
7402

And I really want to know the reason of the error too.
Thank you.


#6

Thanks for the information.
I just created a conda environment with the same versions and couldn’t reproduce the error.
Which GPU are you using? The code runs fine on a GTX1080Ti.