Different results on CPU and GPU of RuntimeError: hidden of RNN is not contiguous

For me, this is a quite interesting problem. I ran the same codes but got different results on CPU and GPU. To be specific, the following codes can run on CPU without any error raised. However, when running on GPU, the following error is raised:

RuntimeError: rnn: hx is not contiguous

The codes of my class are given as follows:

class EncoderLSTM(nn.Module):
    def __init__(self, voc_size, hidden_size=HIDDEN_SIZE, max_length=MAX_LENGTH+2):
        super(EncoderLSTM, self).__init__()
        self.hidden_size = hidden_size
        self.max_length = max_length

        self.memorising = nn.Embedding(voc_size, self.hidden_size)
        self.attn = Attn(hidden_size)
        self.dropout = nn.Dropout(DROPOUT_RATIO)
        self.lstm = nn.LSTM(hidden_size, hidden_size)
        self.init_hidden = self.init_hidden_and_cell()
        self.init_cell = self.init_hidden_and_cell()

    def forward(self, embedded_input_var):
        batch_size = embedded_input_var.shape[0]

        # Initialise the initial hidden and cell states for encoder
        last_hidden = self.init_hidden.expand(-1, batch_size, -1)
        last_cell = self.init_cell.expand(-1, batch_size, -1)
        # Forward pass through LSTM
        for t in range(NUM_WORD):
            # Calculate attention weights from the current LSTM input
            attn_weights = self.attn(last_hidden, embedded_input_var)
            # Calculate the attention weighted representation
            r = attn_weights.bmm(embedded_input_var).transpose(0, 1)
            # Forward through unidirectional LSTM
            lstm_output, (lstm_hidden, lstm_cell) = self.lstm(r, (last_hidden, last_cell))

        # Return hidden and cell state of LSTM
        return lstm_hidden, lstm_cell

    def init_hidden_and_cell(self):
        return nn.Parameter(torch.zeros(1, 1, self.hidden_size, device=DEVICE))

I guess the cause of this error is that the variable last_hidden in line 19 is not contiguous. However, what I don’t understand is that the different results on CPU and GPU.

Why running on CPU wouldn’t raise this error?

No expert, but I suppose CPU and GPU handle memory very differently. Read this for good explanation of contiguity.

Anyway, try using the .contiguous() method on the tensor which needs it.