No idea why you are doing this…but don’t :).
Before that, h0
has the correct shape: (num_layer, batch_size, hidden_size)
. After that line, it’s only (batch_size, hidden_size)
. The nn.RNN
, since it sees only a 2d tensor, interprets this as (seq_len, hidden_size)
.
The problem is that nn.RNN
wants a 3d tensor. It therefore does an unsequeeze(1)
to add the batch dimension – again, the nn.RNN
thinks it’s an unbatched input, so it makes it a batched one. You can also check this post that addresses the same issue.