Do you have all of the sequences in hand at one time?
If so, I’d think you could do something like this in your collate function for your dataloader:
sequences.sort(key=lambda x: len(x), reverse=True)
lengths = [len(sequence) for sequence in text_list]
reversed_sequences = sequences[::-1]
padded_reversed =
pad_sequence(reversed_sequences, batch_first=True, padding_value=vocab["<pad>"])
return padded_reversed[::-1]
Basically reverse, pad, reverse…
Then you’d have to mask your network outputs appropriately when measuring loss.
I didn’t test the above, but I don’t see why it would not work assuming the rest of your model/training/loss was set up right.