LSTM: how to handle long input sequences with fixed length bptt

When processing very long sequences, it is impractical to bptt to beginning of the sequence. I don’t see anything in pytorch docs or examples where the sequence length spans multiple minibatches. Say we have the following sequences padded to be suitable for use by nn.LSTM after packing using pack_padded_sequence():
a b c eos
e f eos 0
h i eos 0
The minibatch size is 4 x 3 (seq_len x batch_size). If the hidden states (h,c) are detached from history every minibatch, bptt span = seq_len = 4. Suppose that the sequence “e f eos” in this minibatch is the tail of a longer sequence
z y x w e f eos
When processing “e f eos”, if we want to keep bptt_span to be fixed at 4, we would have to preserve (h, c) across minibatch boundaries.

Is there some facility provided by pytorch to enable this feature or is the best way to implement this is for the user to keep track of (h_t, c_t) returned by nn.LSTM and manage detaching and resetting hidden states?

You should keep track of (h_t, c_t) and manage .detach()ing.