Are "left-padded" sequences possible?

Wesley_Neill · August 12, 2021, 4:37pm

Do you have all of the sequences in hand at one time?

If so, I’d think you could do something like this in your collate function for your dataloader:

sequences.sort(key=lambda x: len(x), reverse=True)
lengths = [len(sequence) for sequence in text_list]

reversed_sequences = sequences[::-1]
padded_reversed = 
    pad_sequence(reversed_sequences, batch_first=True, padding_value=vocab["<pad>"])

return padded_reversed[::-1]

Basically reverse, pad, reverse…

Then you’d have to mask your network outputs appropriately when measuring loss.

I didn’t test the above, but I don’t see why it would not work assuming the rest of your model/training/loss was set up right.