Shall I pass hidden states to LSTM or not?

Hi, I have a couple of weather related data like tempreature etc… in time series format.

Now I would like to experiment with regression and classification tasks. (Like predicting tomorrow’s weather,or just simply classifying day as sunny/rainy/cloud …)

For this task I have an LSTM network but I don’t know I shall retain the hidden states or reset them periodically.

Technically speaking:
1.) Shall I invoke the recurrent layer without passing hiden states like this:

lstm_output, _ = lstm(inputs)

2.) Or save the hidden states somewhere and reset them at every minibatch:

lstm_output, new_hidden_states = self.lstm(inputs, hidden_states)

Any advice would be appreciated! :slight_smile:

if each datapoint is already in timeseries form, the gradients are accumulated through time automatically, and you don’t need to save last batch’s hidden states. Also, since sequences between each batch are not continuous, keeping the hidden states will lead to the LSTM learning the jumps from sequence to sequence :slightly_smiling_face:

If u don’t pass the hidden states to the lstm, the hidden states will be initialized to zeros, so usually u don’t have to worry about the hidden states.

Thanks for the comments!

Also, I have to mention that the batches are sampled randomly. So In side the batch the data is sequential. But the next batch does not contains the next data points.

Does this make a difference?

I don’t think so. For lstm with batch_first=False, the input should be of shape [L, B, H]. For lstm with batch_first=True, the input should be of shape [B, L, H].