LSTM hidden & cell outputs and packed_sequence for variable-length sequence inputs

supakjk · March 19, 2017, 9:45am

I have three questions regarding LSTM.

As shown in http://pytorch.org/docs/nn.html#lstm, torch.nn.LSTM returns hidden and cell Variables at the last step as well as the outputs (hidden states over all the time step of the last layer.)
Given an input Variable, I wonder if the returned last hidden and cell values are the same regardless the input is packed with packed_sequence or directly used.

Now assume that the input tensor is as follows (seqlen=5, batchsize=3, each letter represents a certain vector (0 denotes pad), and left aligned),
“a b c d e”
"x y z 0 0"
"v w 0 0 0"
2) If we use bidirectional LSTM, the hidden and cell Variables at the last step are also double-sized (assuming # layers is 1).
I wonder if the hidden and cell values at the last step correspond to “e00” for forward part output (the first dimension of hidden and cell variables) and “axv” for backward part output (the second dimension).

How can we effectively obtain the “actual” (not masked) hidden (or cell) outputs at the last step for bidirection LSTM? For example, in the input case, how can we get the outputs corresponding to “ezw” for forward direction and “axv” for backward direction? Should we use certain indexing with sequence lengths, which would be inefficient when using GPUs? or are there more elegant ways?

Thanks!

jekbradbury · March 19, 2017, 6:55pm

For a forward RNN, the returned last hidden and cell values are e00 if you don’t use PackedSequence, but they’re ezw if you do. For the backward direction of a bidirectional RNN, they’re axv in both cases, but the RNN will have started at ezw in the PackedSequence case and e00 in the case without it.

supakjk · March 19, 2017, 7:47pm

@jekbradbury Thanks! That means RNNs with PackedSequence inputs already returns hiddens and cells exactly as I wanted!

cerisara · March 20, 2017, 7:26pm

When building a PackedSequence, is the padding always assumed to be on the right ? Is there a way to give sequences that are padded on the left ?
Thanks for the great work !

jekbradbury · March 20, 2017, 8:32pm

Yes, the padding is always assumed to be on the right. There are reasons to put the padding on the left if your model is performing computations on the padding tokens and you want to minimize the distortion, but if there are reasons to use left-padding rather than right-padding when the padding tokens won’t be used, we must have overlooked them.

cerisara · March 20, 2017, 9:15pm

No, it’s fine to pad on the right !
I just wanted to know whether it is a constraint or not, because I then have to modify my old code to fulfil that contraint, that’s all. Thx !

Nick_Young · May 25, 2017, 7:21am

I have a question. What does it mean “padding on the right”?

In real situation, “a” is represented by a vector, right? so one sample “x y z 0 0” is represented by a matrix.

So what you mean “padding on the right” is “padding several all-zero lines under the useful data”.

Am I right?

jekbradbury · May 26, 2017, 10:37pm

Yes. The padding is on the right if the timesteps of the sentence are left to right and you think of the vector of x or a as going into the screen.

JooSung_Yoon · June 22, 2017, 7:53am

I have another question:)
I saw some codes that is
If LSTM get input as packed_sequence (pack_padded_sequence), LSTM doesn’t need initial hidden and cell state.

For example)
Without pack_padded_sequence,
out, hidden = self.lstm(input, (h0, c0))

with pack_padded_sequence,
packed = self.lstm(pack_padded_sequence_variable) #without (h0,c0)

I can not understand how it works. I saw doc in pytorch but I couldn’t figure it out. is there anyone who help me??

smth · June 22, 2017, 2:54pm

providing initial hidden state is independent of whether you give packed sequences or not. If you dont give initial hidden state, it is initialized to zero.

yukw777 · August 12, 2021, 6:07pm

Hey James, I have a use case for left-padding when the padding token won’t be used: Are "left-padded" sequences possible?. Any elegant way to handle this? Would it be simpler for me to just accept the extra calculations and process padding tokens on the left?