Why is the default dimensionality of GRU/LSTM layers (seq.length x batch_size x hidden_dim)? This is quite confusing, as I need to do more layer permutations, to make sure that the next layer (e.g. fc) takes in the batch rather than the sequence as an input.
view() can be a dangerous function. Just because it throws nor error and the next layer accept it as input doesn’t necessarily mean that that the reshaping is “semantically” correct.
I’m not say that your use of view() in this case is wrong, but I would strongly recommend transpose() to swap the batch_size and seq_len dimension. In your case that would be:
out_batch = out_batch.transpose(0,1)
Alternatively, you can create out_lstm with LSTM(..., batch_first=True) to accept your out_batch directly after the embedding layer.