LSTM layer dimensionality

Why is the default dimensionality of GRU/LSTM layers (seq.length x batch_size x hidden_dim)? This is quite confusing, as I need to do more layer permutations, to make sure that the next layer (e.g. fc) takes in the batch rather than the sequence as an input.

What’s the rationale for this?

the next layer will takes in the batch although the shape of rnn layer is (seq_length, batch_size, hidden_dim) when rnn run all time step.

OK so is this correct?

>     def forward(self, x)
>         b = x.size()[0]
>         out_batch = self.embedding(x.long())
>         # lstm - convert input to (seq_lengthxbatch sizexnum of features) dims
>         out_lstm, (h_t, o_t) = self.lstm(out_batch.view(-1, b, self.embedding_dim))
>         out_lstm = out_lstm[-1,:,:]
>         out_fc1 = self.fc1(out_lstm)
>         return out_fc1

where self.lstm = nn.LSTM(embedding_dim, hidden_dim_lstm, n_layers, batch_first=True)

view() can be a dangerous function. Just because it throws nor error and the next layer accept it as input doesn’t necessarily mean that that the reshaping is “semantically” correct.

I’m not say that your use of view() in this case is wrong, but I would strongly recommend transpose() to swap the batch_size and seq_len dimension. In your case that would be:

out_batch = out_batch.transpose(0,1)

Alternatively, you can create out_lstm with LSTM(..., batch_first=True) to accept your out_batch directly after the embedding layer.

if u set batch_first=True, the shape of input must be (B, L, D).

batch_first – If True , then the input and output tensors are provided as (batch, seq, feature). Default: False