I have worked through https://github.com/HarshTrivedi/packing-unpacking-pytorch-minimal-tutorial to understand pad_packed_sequence and pack_padded_sequence. But it does not have codes for a full training example. I am assuming the task at hand is seq2seq of same length for output and input, e.g. language model.
Some training example I found on line follows the following pattern, which I found sub-optimal.
- pad so that all have same length
- use pack_padded_sequence to pack
- pass it through RNN, possible MLP on top of output of RNN
- unpack via pad_packed_sequence
- multiple by mask of zero for shorter sequence to stop their gradient
I think it would be better if we do:
1,2,3 same as before
4. pack the target variable
5. loss = loss_fun(packed outputs from RNN, packed target)
This way I think you save computation. Let me know if you think this is correct or point me to a full training cycle codes? And I think variable length is an common enough topic to include in official tutorial.