I am training a hierarchical model using Pytorch. This proved to be considerably difficult as the masking needs a sorted list of lengths. In a hierarchical model, I can sort the samples based on sentences and get away with padding sentences. But I will definitely have to mask the words, and Pytorch only supports them in a sorted order.

Is there a reason for enforcing the sorted order, some sort of optimization? Can I make it take inputs without the sorting?

I have also been working with padded sequences and the need to order them. I am currently running into a problem where I have 2 different inputs and 2 RNNs. I can sort both inputs according to their length, but then the respective dimensions don’t match each other anymore, so I have to re-instantiate the original order after running through the RNN (basically as shown here: RNNs Sorting operations autograd safe?). I’m just worried that this operation is not autograd-safe, i.e. that the gradients get lost or are associated with the wrong matrix entries after resorting.

For unsorted sequences, use enforce_sorted = False. If enforce_sorted is True , the sequences should be sorted by length in a decreasing order, i.e. input[:,0] should be the longest sequence, and input[:,B-1] the shortest one. enforce_sorted = True is only necessary for ONNX export.