LSTM hidden & cell outputs and packed_sequence for variable-length sequence inputs


(Joo-Kyung Kim) #1

I have three questions regarding LSTM.

  1. As shown in http://pytorch.org/docs/nn.html#lstm, torch.nn.LSTM returns hidden and cell Variables at the last step as well as the outputs (hidden states over all the time step of the last layer.)
    Given an input Variable, I wonder if the returned last hidden and cell values are the same regardless the input is packed with packed_sequence or directly used.

Now assume that the input tensor is as follows (seqlen=5, batchsize=3, each letter represents a certain vector (0 denotes pad), and left aligned),
“a b c d e”
"x y z 0 0"
"v w 0 0 0"
2) If we use bidirectional LSTM, the hidden and cell Variables at the last step are also double-sized (assuming # layers is 1).
I wonder if the hidden and cell values at the last step correspond to “e00” for forward part output (the first dimension of hidden and cell variables) and “axv” for backward part output (the second dimension).

  1. How can we effectively obtain the “actual” (not masked) hidden (or cell) outputs at the last step for bidirection LSTM? For example, in the input case, how can we get the outputs corresponding to “ezw” for forward direction and “axv” for backward direction? Should we use certain indexing with sequence lengths, which would be inefficient when using GPUs? or are there more elegant ways?

Thanks!


(James Bradbury) #2

For a forward RNN, the returned last hidden and cell values are e00 if you don’t use PackedSequence, but they’re ezw if you do. For the backward direction of a bidirectional RNN, they’re axv in both cases, but the RNN will have started at ezw in the PackedSequence case and e00 in the case without it.


(Joo-Kyung Kim) #3

@jekbradbury Thanks! That means RNNs with PackedSequence inputs already returns hiddens and cells exactly as I wanted!


(Christophe Cerisara) #4

When building a PackedSequence, is the padding always assumed to be on the right ? Is there a way to give sequences that are padded on the left ?
Thanks for the great work !


(James Bradbury) #5

Yes, the padding is always assumed to be on the right. There are reasons to put the padding on the left if your model is performing computations on the padding tokens and you want to minimize the distortion, but if there are reasons to use left-padding rather than right-padding when the padding tokens won’t be used, we must have overlooked them.


(Christophe Cerisara) #6

No, it’s fine to pad on the right !
I just wanted to know whether it is a constraint or not, because I then have to modify my old code to fulfil that contraint, that’s all. Thx !


(Nick Young) #7

I have a question. What does it mean “padding on the right”?

In real situation, “a” is represented by a vector, right? so one sample “x y z 0 0” is represented by a matrix.

So what you mean “padding on the right” is “padding several all-zero lines under the useful data”.

Am I right?


(James Bradbury) #8

Yes. The padding is on the right if the timesteps of the sentence are left to right and you think of the vector of x or a as going into the screen.


(Joo Sung Yoon) #9

I have another question:)
I saw some codes that is
If LSTM get input as packed_sequence (pack_padded_sequence), LSTM doesn’t need initial hidden and cell state.

For example)
Without pack_padded_sequence,
out, hidden = self.lstm(input, (h0, c0))

with pack_padded_sequence,
packed = self.lstm(pack_padded_sequence_variable) #without (h0,c0)

I can not understand how it works. I saw doc in pytorch but I couldn’t figure it out. is there anyone who help me??


#10

providing initial hidden state is independent of whether you give packed sequences or not. If you dont give initial hidden state, it is initialized to zero.