I understand how padding and pack_padded_sequence work, but I have a question about how it’s applied to Bidirectional.
Does the BiLSTM (from nn.LSTM) automatically applied the inverse of the sequence (also in case of using the pad_packed_sequence)?
If yes, so does the padding affect the first-last timestep?
for example : seq1=[a, b, c, d, e], seq2=[x, y, z]
and after padding : seq1=[a, b, c, d, e], seq2=[x, y, z, 0, 0]
If we input the seq2, this means that BiLSTM
- take input of [x, y, z, 0, 0] and [0, 0, z, y, z]
or - take input of [x, y, z, 0, 0] and [z, y, x, 0, 0]
Would you please help me clarify these points? Thank you very much
I think the length information is even not required.
Note that we pad the short sequences with zeros. RNN with input hidden=None (the first frame) would actually creates hidden=torch.zeros(...). In this case, [z, y, x, 0, 0] and [0, 0, z, y, x] just produce same results.
But for theoretically, an inverse ([x, y, z, 0, 0] -> [0, 0, z, y, x]) and a shift ([0, 0, z, y, x] -> [z, y, x, 0, 0]) are required in case paddings are not zeros.