I’m pretty sure, that’s your issue
Before that, lstm_out
has shape of (batch_size, seq_len, num_directions*hidden_dim)
. After the .view()
it’s (batch_size*seq_len*num_directions, hidden_dim)
– note that might also be wrong. With batch_size=50
, seq_len=200
and num_directions=1
the shape is as expected: (10000, hidden_dim)
.
This means you’ve created a tensor that is interpreted as having 10,000 samples. Given your classification task, here are two suggestions:
-
Don’t use
lstm_out
butlstm_hidden
.lstm_out
contains the hidden state at each time step;lstm_hidden
only the last hidden state -
If you use
lstm_out
you may want to sum/avg the hidden states at each time step