Building custom LSTM with packed Sequences

I’m trying to build an LSTM variant, specifically the multiplicative LSTM (https://arxiv.org/abs/1609.07959). I want to use this to train a language model on data that varies quite a bit in sentence length so packing is very appealing. I think I can build an mLSTM using a slight modification of the LSTM operations as well as calls to LSTMCell, but I’m not sure how to use packing. All of the examples I’ve seen using torch.nn.utils.pack_padded plug the packed sequence into a built-in RNN method like nn.LSTM, which handles everything under the hood. Im not sure how to approach this and I’m not entirely certain as to how the parallelization is done with the packed sequence (how does the model know what to evaluate when computing losses?). Would appreciate any advice