How handle variable size sequences in LSTM module of PyTorch?

I know how to handle variable size sequence data in PyTorch for most cases. For example presume we want to predict next position of a person given its past locations. In this case we can pad different sequences with pad_sequence function. Then we can use pack_padded_sequence and pad_packed_sequence to calculate a batch of sequences of diverse lengths. Finally, We can use a mask on output of LSTM and ground truth positions and calculate loss and do back propagation.

In case of above example I’m aware of handling variable size sequences during training.

But assume another application of LSTM that we use hidden state of LSTM in last time step and concatenate it with other features and pass it to a deep neural network. Finally the deep neural network gives us a loss. I want to train all networks in an end to end fashion. But in this case I don’t know how to handle variable size sequences. In this case we don’t have ground truth for LSTM for different sequences, so we can’t use masking after pack_padded_sequence and pad_packed_sequence . Instead we have a loss that back propagated from deep neural network.