Why do sequences need to get sorted for packing?

In order to make an LSTM deal with variable length sequences in a batch, I have to use pack_padded_sequence first. But why do the sequences for a batch have to be sorted in decreasing order of length? In order to match each of the sequences with the correct training signal, I then have to unsort the output of the unpacked sequences again. I do not understand why this reordering is necessary, it makes everything a whole lot more complex in the code.

What I also do not understand is why there is no option to instead pass a mask on to the LSTM so that the LSTM only runs over those elements where the mask is true/1. That way, we could achieve the same as with packing/unpacking but for the original padded sequences.

One final question: if I use a uni-directional LSTM and I know the lengths of the sequences but still run the LSTM on the padded sequences, then if I am just interested in the final output/hidden state for the sequence, I could just manually pick the proper element from the output, right? However for bidirectional LSTMs there is no way to simulated the processing of the packed sequence with just the padded sequences, because the LSTM will always run over the pad elements as well which will change the hidden states?


Crossposted: https://www.quora.com/unanswered/Why-do-sequences-need-to-get-sorted-for-packing

hopefully we’ll get an answer some day.