I have three questions regarding LSTM.
- As shown in http://pytorch.org/docs/nn.html#lstm,
torch.nn.LSTM returns hidden and cell
Variables at the last step as well as the outputs (hidden states over all the time step of the last layer.)
Given an input
Variable, I wonder if the returned last hidden and cell values are the same regardless the input is packed with
packed_sequence or directly used.
Now assume that the input tensor is as follows (seqlen=5, batchsize=3, each letter represents a certain vector (0 denotes pad), and left aligned),
“a b c d e”
"x y z 0 0"
"v w 0 0 0"
2) If we use bidirectional LSTM, the hidden and cell
Variables at the last step are also double-sized (assuming # layers is 1).
I wonder if the hidden and cell values at the last step correspond to “e00” for forward part output (the first dimension of hidden and cell variables) and “axv” for backward part output (the second dimension).
- How can we effectively obtain the “actual” (not masked) hidden (or cell) outputs at the last step for bidirection LSTM? For example, in the input case, how can we get the outputs corresponding to “ezw” for forward direction and “axv” for backward direction? Should we use certain indexing with sequence lengths, which would be inefficient when using GPUs? or are there more elegant ways?
For a forward RNN, the returned last hidden and cell values are e00 if you don’t use PackedSequence, but they’re ezw if you do. For the backward direction of a bidirectional RNN, they’re axv in both cases, but the RNN will have started at ezw in the PackedSequence case and e00 in the case without it.
@jekbradbury Thanks! That means RNNs with
PackedSequence inputs already returns hiddens and cells exactly as I wanted!
When building a PackedSequence, is the padding always assumed to be on the right ? Is there a way to give sequences that are padded on the left ?
Thanks for the great work !
Yes, the padding is always assumed to be on the right. There are reasons to put the padding on the left if your model is performing computations on the padding tokens and you want to minimize the distortion, but if there are reasons to use left-padding rather than right-padding when the padding tokens won’t be used, we must have overlooked them.
No, it’s fine to pad on the right !
I just wanted to know whether it is a constraint or not, because I then have to modify my old code to fulfil that contraint, that’s all. Thx !
I have a question. What does it mean “padding on the right”?
In real situation, “a” is represented by a vector, right? so one sample “x y z 0 0” is represented by a matrix.
So what you mean “padding on the right” is “padding several all-zero lines under the useful data”.
Am I right?
Yes. The padding is on the right if the timesteps of the sentence are left to right and you think of the vector of x or a as going into the screen.
I have another question:)
I saw some codes that is
If LSTM get input as packed_sequence (pack_padded_sequence), LSTM doesn’t need initial hidden and cell state.
out, hidden = self.lstm(input, (h0, c0))
packed = self.lstm(pack_padded_sequence_variable) #without (h0,c0)
I can not understand how it works. I saw doc in pytorch but I couldn’t figure it out. is there anyone who help me??
providing initial hidden state is independent of whether you give packed sequences or not. If you dont give initial hidden state, it is initialized to zero.
Hey James, I have a use case for left-padding when the padding token won’t be used: Are "left-padded" sequences possible?. Any elegant way to handle this? Would it be simpler for me to just accept the extra calculations and process padding tokens on the left?