Are "left-padded" sequences possible?

Do you have all of the sequences in hand at one time?

If so, I’d think you could do something like this in your collate function for your dataloader:

sequences.sort(key=lambda x: len(x), reverse=True)
lengths = [len(sequence) for sequence in text_list]

reversed_sequences = sequences[::-1]
padded_reversed = 
    pad_sequence(reversed_sequences, batch_first=True, padding_value=vocab["<pad>"])

return padded_reversed[::-1]

Basically reverse, pad, reverse…

Then you’d have to mask your network outputs appropriately when measuring loss.

I didn’t test the above, but I don’t see why it would not work assuming the rest of your model/training/loss was set up right.