Pytorch LSTM Dimension Issue

vdw · February 23, 2022, 6:44am

I’m pretty sure, that’s your issue

Before that, lstm_out has shape of (batch_size, seq_len, num_directions*hidden_dim). After the .view() it’s (batch_size*seq_len*num_directions, hidden_dim) – note that might also be wrong. With batch_size=50, seq_len=200 and num_directions=1 the shape is as expected: (10000, hidden_dim).

This means you’ve created a tensor that is interpreted as having 10,000 samples. Given your classification task, here are two suggestions:

Don’t use lstm_out but lstm_hidden. lstm_out contains the hidden state at each time step; lstm_hidden only the last hidden state
If you use lstm_out you may want to sum/avg the hidden states at each time step