I was following this tutorial online for predicting time series:

I don’t like how the hidden state is reset to zero for each training batch, as surely that is getting rid of important information about the previous state of the system? Would it be wrong to use the output (the final hidden state), as the initial hidden state for the next training batch in a time series? My project involves a sort of Kalman filter, so in that context, I feel like it would make sense to keep using the previous state given an initial hidden state.

No, that state encodes “present & future” of timeseries in that batch, state transition rules are encoded in rnn module’s parameters.

New batch usually implies a new bunch of timeseries (or at least non-consecutive [random] input slices), thus old momentary states are not applicable. Instead, it is possible, for some tasks, to use static per timeseries features to initialize hidden states (via encoder MLP).

Thanks for your reply! So what would be the way to ecode a previous state like in Kalman Filters? Essentially what I want is a Kalman filter but with highly non-linear functions to model the transition and the emission matrices (in the linear Kalman Filter)

Not sure I understand correctly, with rnns you automatically have some encoded non-zero states once you do at least one step. Gated rnns can adapt to treat the first step (or hidden state of zeros) as more important, so they are usable without [non-zero] initial states. If you have multiple timeseries with different a priori states, you can train an encoder network to produce them from some metadata.