"length" argument of pack_padded_sequence with different padding schemes


I was wondering about the implementation of the pack_padded_sequence method from torch.nn.utils.rnn. As per my understanding pack_padded_sequence is applied to an already padded sequence and then sent to an LSTM layer. This allows us to avoid computations on the 0-padded elements in the variable length sequences that are passed to the model. From what I understand, the standard padding scheme used in pytorch is to pad at the end of each sequence until we reach the max sequence length in the batch.

So my question is: Can I use the pack_padded_sequence on sequences that are front-padded and/or use split padding (doing half of the padding at the head, and half at the tail) as well? Will the underlying methods begin “iterating” through each element after a non-padding element is found, or will the iteration through the elements begin at the start of each sequence, possibly leading to not doing calculations of non-padding elements due to “length” not actually coinciding with the index of the array that the actual sequence stops at?

Would pack_padded_sequence() and then passing to LSTM work the same for these sequences?


(Note: all sequences in each batch would be padded in the same way, but I only use one row per padding type for brevity…)

Based on this comment from the internal implementation I assume the padding is only supported at the end of each sample.