GRU CUDNN_STATUS_EXEUTION_FAILED with many attempted fixes

I have a standard encoder architecture and try to run it with input sequence:

x_input = torch.tensor([   0,   83,    0, 1257,    0,   25,   34,  239,   31,  275, 1057,  171, 7,  809,  174], device=device)
enc = Encoder(10889, 128, True).to(device)
x_h = enc.initHidden()
x_out, x_h = enc(x_inds,x_h)

But I kept getting the error “cuDNN error: CUDNN_STATUS_EXECUTION_FAILED” when the embedding is passed to GRU. The model worked perfectly on CPU. I didn’t have this error untill I tried to install allennlp, and the installation was interrupted due to the torch and cuda imcompatibility.

Traceback message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 52, in <module>
  File "/anaconda2/envs/mypython3/lib/python3.7/site-packages/torch/nn/modules/", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "", line 46, in forward
    output, x_h = self.gru(output) # output: length x batch x 2H, hidden: 2 x 1 x H
  File "/anaconda2/envs/mypython3/lib/python3.7/site-packages/torch/nn/modules/", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/anaconda2/envs/mypython3/lib/python3.7/site-packages/torch/nn/modules/", line 179, in forward
    self.dropout,, self.bidirectional, self.batch_first)

Encoder definition:

class EncoderRNN(nn.Module):
	def __init__(self, input_size, hidden_size, biflag):
		super(EncoderRNN, self).__init__()
		self.hidden_size = hidden_size
		self.biflag = biflag
		self.embedding = nn.Embedding(input_size, hidden_size) # input size is vocab size 
		if biflag:
			self.gru = nn.GRU(hidden_size, hidden_size,bidirectional=True)
			self.gru = nn.GRU(hidden_size,hidden_size, bidirectional=False)

	def forward(self, input, hidden):
		embedded = self.embedding(input)
		output = embedded.unsqueeze(0)
		output, x_h = self.gru(output) # output: length x batch x 2H, hidden: 2 x 1 x H
		final =[x_h[0:x_h.size(0):2], x_h[1:x_h.size(0):2]], dim=2) # 1 # 2 x H 
		return output, final

	def initHidden(self):
		if self.biflag:
			return torch.zeros(2, 1, self.hidden_size, device=device)
			return torch.zeros(1, 1, self.hidden_size, device=device)

My current torch version is ‘1.0.1.post2’, cuda as ‘9.0.176’, 4 RTX 2080 Ti, gpu memory is empty.

Here is a list of things I’ve tried from seeing discussions here or elsewhere, but failed:

*Upgrade to torch 1.1.0,

  • Reinstall torch 1.0.1.post2

  • Explicitly move model and variables to cuda

Anyone has any suggestion about what I should do next? Thanks a lot!

Could you update to CUDA10, as CUDA9 is not recommended for Turing GPUs?
Also, which errors/issues are you seeing when trying to update PyTorch?

I didn’t see errors when upgraded/downgraded to pytorch 1.1.0/1.0.1.post2. I encountered error when tried to do “pip install pytorch-fast-elmo”. It got interruption from allennlp:

ERROR: allennlp 0.9.0 has requirement torch>=1.2.0, but you'll have torch 1.0.1.post2 which is incompatible.
Installing collected packages: torch, pytorch-stateful-lstm, h5py, fire, pytorch-fast-elmo
  Found existing installation: torch 1.2.0
    Uninstalling torch-1.2.0:
      Successfully uninstalled torch-1.2.0
    Running install for pytorch-stateful-lstm ... done
  Found existing installation: h5py 2.10.0
    Uninstalling h5py-2.10.0:
      Successfully uninstalled h5py-2.10.0
    Running install for fire ... done
    Running install for pytorch-fast-elmo ... done
Successfully installed fire-0.1.3 h5py-2.9.0 pytorch-fast-elmo-0.6.12 pytorch-stateful-lstm-1.6.0 torch-1.0.1.post2

The CUDNN came in after this pip command. And then I reinstalled pytorch: I tried pytorch 1.1.0 first and then 1.0.1.post2 (which is the version I was using and worked fine before this pip command).

Could you update to the latest stable PyTorch version (1.4.0) or, as the error message suggests, to at least 1.2.0?
I would recommend to use the latest version, if possible, as it’ll ship with new features as well as bug fixes.

Thanks! I installed pytorch 1.4.0 with cuda 10.1 and the issue is resolved.