-
I think that if you give an nn.Embedding input of shape (seq_len, batch_size), then it will happily produce output of shape (seq_len, batch_size, embedding_size). Embedding expects 2d input and replaces every element with a vector. Thus the order of the dimensions of the input has no importance.
-
Your LSTM input and output sizes look mostly good to me. This post helped me get my head around them. Understanding output of lstm
You can initialise nn.LSTM with batch_first=True
if you need to switch the seq_len and batch_size dimensions of the input and output.
If the input to nn.Embedding is appropriately shaped then I can’t see why a .view
operation before the LSTM should be necessary.
- For consuming last hidden state only…
lstm_output, (last_hidden_state, last_cell_state) = self.lstm(embedded)
linear_input = last_hidden_state[-1] # get hidden state for last layer
# or equivalently
linear_input = lstm_output[-1] # get last step of output
For consuming the hidden states of the whole sequence
lstm_output, (last_hidden_state, last_cell_state) = self.lstm(embedded)
batch_first = lstm_output.transpose(0,1)
linear_input = batch_first.view(batch_size, -1)
Note that in this case the sequence length must always be the same.
Most tensor ops work on Variables too, which is necessary if you want to backpropagate.
If you operate on tensors directly then those operations are not stored in the computation graph and cannot be backpropagated.