I’m trying to reproduce some previous work I did with Theano in PyTorch, with RNNs.
I want to be able to mask the sequences I pass as input to an RNN based model. This should be easy enough…
However, there’s a couple of annoying issues that are bugging me:
PackedSequences inputs are only supported by RNNs, which means that I have to constantly use pack_padded_sequences and pad_packed_sequences constantly, back and forth, in order to have a model with RNN layers that interact with other types of layers.
The requirement of pack_padded_sequences of having a sorted sequence length list. This is pretty much the same as bucketing and it is conditioning training by not allowing random sapling. By having this requirement, how can I combine a DataLoader with PackedSequence without having to fully sort my dataset by length?
And finally, one last question:
Is it possible to mask the loss function natively in PyTorch?
Yes, for now you have to constantly pack and unpack if you’re mixing RNNs and conv layers. If you’re mixing RNNs and fully connected layers, you don’t have to unpack at all – you can actually call the linear layer directly on the packed sequence.
The list just has to be sorted within an individual batch, so you can still shuffle your dataset and randomly sample a batch at a time, then sort each batch to send it to pack_padded_sequence.