I have three questions regarding LSTM.

- As shown in http://pytorch.org/docs/nn.html#lstm,
`torch.nn.LSTM`

returns hidden and cell`Variable`

s at the last step as well as the outputs (hidden states over all the time step of the last layer.)

Given an input`Variable`

, I wonder if the returned last hidden and cell values are the same regardless the input is packed with`packed_sequence`

or directly used.

Now assume that the input tensor is as follows (seqlen=5, batchsize=3, each letter represents a certain vector (0 denotes pad), and left aligned),

â€śa b c d eâ€ť

"x y z 0 0"

"v w 0 0 0"

2) If we use bidirectional LSTM, the hidden and cell `Variable`

s at the last step are also double-sized (assuming # layers is 1).

I wonder if the hidden and cell values at the last step correspond to â€śe00â€ť for forward part output (the first dimension of hidden and cell variables) and â€śaxvâ€ť for backward part output (the second dimension).

- How can we effectively obtain the â€śactualâ€ť (not masked) hidden (or cell) outputs at the last step for bidirection LSTM? For example, in the input case, how can we get the outputs corresponding to â€śezwâ€ť for forward direction and â€śaxvâ€ť for backward direction? Should we use certain indexing with sequence lengths, which would be inefficient when using GPUs? or are there more elegant ways?

Thanks!