In the documentation, output of torch.nn.LSTM is described as follows.
output (seq_len, batch, hidden_size * num_directions): tensor containing the output features (h_t) from the last layer of the RNN, for each t. If a torch.nn.utils.rnn.PackedSequence has been given as the input, the output will also be a packed sequence.
Previously I thought output is a tensor containing the output features which is actually o_t, not h_t. Though, o_t is not an input to the LSTM, so the description looks reasonable.
But in few examples, I have seen, to get all the hidden state representations for each word in a sequence (language modeling task), we loop through all the words in that sequence. If output
already gives us all the hidden state representations (for all t from the final layer), why we need to loop through all the words of a sequence?
I asked a similar question before - How to retrieve hidden states for all time steps in LSTM or BiLSTM? where I got the answer from @smth that, “To get individual hidden states, you have to indeed loop over for each individual timestep and collect the hidden states”. But now I feel individual hidden states for each timestep is already provided. (Please correct me if I am wrong)
Moreover this leads me to another concern that if we need o_t
from the last layer of the RNN, how can we get them? Can anyone shed some light in this?