Google Colab RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

I am using google colab to train a Bidirectional RNN model and I get the error:

RuntimeError                              Traceback (most recent call last)
<ipython-input-34-0029e71ae99b> in <module>()
     20             inputs, labels =,
---> 22             output = model(inputs)
     23             loss = criterion(output.squeeze(), labels.float())
     24             optimizer.zero_grad()

5 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/ in forward_impl(self, input, hx, batch_sizes, max_batch_size, sorted_indices)
    524         if batch_sizes is None:
    525             result = _VF.lstm(input, hx, self._get_flat_weights(), self.bias, self.num_layers,
--> 526                               self.dropout,, self.bidirectional, self.batch_first)
    527         else:
    528             result = _VF.lstm(input, batch_sizes, hx, self._get_flat_weights(), self.bias,


I tried this solution 1 and this 2 and still get the error.

Here’s my BiRnn Model code:

class BiRNN(nn.Module):
    def __init__(self, n_vocab, n_embed, hidden_size, seq_len, num_layers, output_size, drop_prob):
        super(BiRNN, self).__init__()
        self.hidden_size = hidden_size
        self.seq_len = seq_len
        self.num_layers = num_layers
        self.embedding = nn.Embedding(n_vocab, n_embed)
        self.lstm = nn.LSTM(n_embed, hidden_size, num_layers, batch_first=True, bidirectional=True)
        self.dropout = nn.Dropout(drop_prob)
        self.fc = nn.Linear(hidden_size*2, output_size)

    def forward(self, x):
         # Set initial states
        h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)  
        c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
        x = self.embedding(x).to(device)
        # Forward propagate LSTM
        lstm_out, _ = self.lstm(x, (h0, c0))  
        lstm_out = lstm_out.contiguous().view(-1, self.seq_len, 2, self.hidden_size)
        # get backward output in first node
        lstm_out_bw = lstm_out[:, 0, 1, :]
        # get forward output in last node
        lstm_out_fw = lstm_out[:, -1, 0, :]
        lstm_out =, lstm_out_bw), -1)
        drop_out = self.dropout(lstm_out)
        logits = self.fc(drop_out)

        return logits

Which PyTorch, CUDA, and cudnn version are you using?

im using google colab, which has the default version of pytorch 1.3, and CUDA 10.1

Issue tracked here.

its working now. I tried to train it on CPU on few epochs and after some steps, it shows another error which is Embeddings index out of range error. Then I resolve the error. I wonder why it doesnt show this error when training on CUDA

I ran into and fixed the problem, you just need to restart Google Colab

1 Like