RuntimeError: Expected hidden[0] size (2, 5, 250), got (2, 30, 250)

While building a semantic model I was having a runtime error that I mentioned in the title.
To solve this error I am not sure if it is my batching, train function, model architecture, or hyperparameters that is causing this error. As a result I have linked the kaggle notebook where you can visualize the entire notebook and run the notebook on the free Kaggle virtual machine with access to all the data I used.

My model architecture is since I think maybe the error can be in here. If I am wrong please click the link to see the whole code and please tell me the solution.

class LSTM(nn.Module):
    def __init__(self,vocab_size, embedding_dim, hidden_size, n_layers):
        super(LSTM, self).__init__()
        self.hidden_size = hidden_size
        self.n_layers = n_layers
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.lstm1 = nn.LSTM(embedding_dim, self.hidden_size, num_layers = self.n_layers, dropout = 0.2, batch_first = True)
        self.fc1 = nn.Linear(self.hidden_size, 1)
        self.sigmoid = nn.Sigmoid()
    def forward(self, x, hidden):
        batch_size = x.size(0)
        sequence_len = x.size(1)
        embeddings = self.embedding(x)
        lstm_out, hidden = self.lstm1(embeddings, hidden)
        output = lstm_out.contiguous().view(-1, self.hidden_size)
        output = self.sigmoid(self.fc1(output))
        output = output.reshape(batch_size, sequence_len, -1)
        output = output[:, -1]
        return hidden, output
    def init_hidden(self, batch_size):
        weight = next(self.parameters()).data
        if (train_on_gpu):
            hidden = (, batch_size, self.hidden_size).zero_().cuda(),
        , batch_size, self.hidden_size).zero_().cuda())
            hidden = (, batch_size, self.hidden_size).zero_(),
            , batch_size, self.hidden_size).zero_())
        return hidden

I really appreciate if someone can help me fix this error.

Based on the error message I guess you are using a batch size of 30, while the last batch in your training loop seems to be smaller (5 samples). The hidden state is however still initialized with the original batch size, which would raise this error.
You would have to either reinitialize the hidden with the current batch size or drop the last batch via drop_last=True in the DataLoader.

Unrelated to this issue, but here it seems you are flattening the activation to [batch_size*seq_len, hidden_size] and pass it to the linear layer:

output = lstm_out.contiguous().view(-1, self.hidden_size)
output = self.sigmoid(self.fc1(output))

However, afterwards you are reshaping it back to [batch_size, seq_len, hidden_size] and are slicing the last temp. step:

output = output.reshape(batch_size, sequence_len, -1)
output = output[:, -1]

If that’s the case, you could also index the last time step before applying the linear layer to avoid unnecessary computation.

Thanks this is the solution.