As shown in http://pytorch.org/docs/nn.html#lstm, torch.nn.LSTM returns hidden and cell Variables at the last step as well as the outputs (hidden states over all the time step of the last layer.)
Given an input Variable, I wonder if the returned last hidden and cell values are the same regardless the input is packed with packed_sequence or directly used.
Now assume that the input tensor is as follows (seqlen=5, batchsize=3, each letter represents a certain vector (0 denotes pad), and left aligned),
“a b c d e”
"x y z 0 0"
"v w 0 0 0"
2) If we use bidirectional LSTM, the hidden and cell Variables at the last step are also double-sized (assuming # layers is 1).
I wonder if the hidden and cell values at the last step correspond to “e00” for forward part output (the first dimension of hidden and cell variables) and “axv” for backward part output (the second dimension).
How can we effectively obtain the “actual” (not masked) hidden (or cell) outputs at the last step for bidirection LSTM? For example, in the input case, how can we get the outputs corresponding to “ezw” for forward direction and “axv” for backward direction? Should we use certain indexing with sequence lengths, which would be inefficient when using GPUs? or are there more elegant ways?
For a forward RNN, the returned last hidden and cell values are e00 if you don’t use PackedSequence, but they’re ezw if you do. For the backward direction of a bidirectional RNN, they’re axv in both cases, but the RNN will have started at ezw in the PackedSequence case and e00 in the case without it.
When building a PackedSequence, is the padding always assumed to be on the right ? Is there a way to give sequences that are padded on the left ?
Thanks for the great work !
Yes, the padding is always assumed to be on the right. There are reasons to put the padding on the left if your model is performing computations on the padding tokens and you want to minimize the distortion, but if there are reasons to use left-padding rather than right-padding when the padding tokens won’t be used, we must have overlooked them.
No, it’s fine to pad on the right !
I just wanted to know whether it is a constraint or not, because I then have to modify my old code to fulfil that contraint, that’s all. Thx !
I have another question:)
I saw some codes that is
If LSTM get input as packed_sequence (pack_padded_sequence), LSTM doesn’t need initial hidden and cell state.
For example)
Without pack_padded_sequence,
out, hidden = self.lstm(input, (h0, c0))
with pack_padded_sequence,
packed = self.lstm(pack_padded_sequence_variable) #without (h0,c0)
I can not understand how it works. I saw doc in pytorch but I couldn’t figure it out. is there anyone who help me??
providing initial hidden state is independent of whether you give packed sequences or not. If you dont give initial hidden state, it is initialized to zero.
Hey James, I have a use case for left-padding when the padding token won’t be used: Are "left-padded" sequences possible?. Any elegant way to handle this? Would it be simpler for me to just accept the extra calculations and process padding tokens on the left?