I have three questions regarding LSTM.
1) As shown in http://pytorch.org/docs/nn.html#lstm,
torch.nn.LSTM returns hidden and cell
Variables at the last step as well as the outputs (hidden states over all the time step of the last layer.)
Given an input
Variable, I wonder if the returned last hidden and cell values are the same regardless the input is packed with
packed_sequence or directly used.
Now assume that the input tensor is as follows (seqlen=5, batchsize=3, each letter represents a certain vector (0 denotes pad), and left aligned),
"a b c d e"
"x y z 0 0"
"v w 0 0 0"
2) If we use bidirectional LSTM, the hidden and cell
Variables at the last step are also double-sized (assuming # layers is 1).
I wonder if the hidden and cell values at the last step correspond to "e00" for forward part output (the first dimension of hidden and cell variables) and "axv" for backward part output (the second dimension).
3) How can we effectively obtain the "actual" (not masked) hidden (or cell) outputs at the last step for bidirection LSTM? For example, in the input case, how can we get the outputs corresponding to "ezw" for forward direction and "axv" for backward direction? Should we use certain indexing with sequence lengths, which would be inefficient when using GPUs? or are there more elegant ways?