Hi,
I have a standard encoder architecture and try to run it with input sequence:
x_input = torch.tensor([ 0, 83, 0, 1257, 0, 25, 34, 239, 31, 275, 1057, 171, 7, 809, 174], device=device)
enc = Encoder(10889, 128, True).to(device)
x_h = enc.initHidden()
x_out, x_h = enc(x_inds,x_h)
But I kept getting the error “cuDNN error: CUDNN_STATUS_EXECUTION_FAILED” when the embedding is passed to GRU. The model worked perfectly on CPU. I didn’t have this error untill I tried to install allennlp, and the installation was interrupted due to the torch and cuda imcompatibility.
Traceback message:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 52, in <module>
File "/anaconda2/envs/mypython3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "model.py", line 46, in forward
output, x_h = self.gru(output) # output: length x batch x 2H, hidden: 2 x 1 x H
File "/anaconda2/envs/mypython3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/anaconda2/envs/mypython3/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 179, in forward
self.dropout, self.training, self.bidirectional, self.batch_first)
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
Encoder definition:
class EncoderRNN(nn.Module):
def __init__(self, input_size, hidden_size, biflag):
super(EncoderRNN, self).__init__()
self.hidden_size = hidden_size
self.biflag = biflag
self.embedding = nn.Embedding(input_size, hidden_size) # input size is vocab size
if biflag:
self.gru = nn.GRU(hidden_size, hidden_size,bidirectional=True)
else:
self.gru = nn.GRU(hidden_size,hidden_size, bidirectional=False)
def forward(self, input, hidden):
embedded = self.embedding(input)
output = embedded.unsqueeze(0)
output, x_h = self.gru(output) # output: length x batch x 2H, hidden: 2 x 1 x H
final = torch.cat([x_h[0:x_h.size(0):2], x_h[1:x_h.size(0):2]], dim=2) # 1 # 2 x H
return output, final
def initHidden(self):
if self.biflag:
return torch.zeros(2, 1, self.hidden_size, device=device)
else:
return torch.zeros(1, 1, self.hidden_size, device=device)
My current torch version is ‘1.0.1.post2’, cuda as ‘9.0.176’, 4 RTX 2080 Ti, gpu memory is empty.
Here is a list of things I’ve tried from seeing discussions here or elsewhere, but failed:
*Upgrade to torch 1.1.0,
-
Reinstall torch 1.0.1.post2
-
Explicitly move model and variables to cuda
Anyone has any suggestion about what I should do next? Thanks a lot!