I'm confusing with nn.LSTM

Hi :),

I see some lstm model codes have init_hidden() in def forward() while some are in def __init__().

Where does this difference come from??

Can you be more clear? Which code and where?


I just find answer.

Actually, my question was Should I initialize hidden state in every epoch?
If I put init_hidden() method in forward() It will initialize hidden state in every epoch.

What I found is, If I have independent sequence data like text, I have to initialize hidden state in every epoch.
But If I have dependent sequence data like stock price or something. Hidden state should not be initialized because hidden state would be keep used in the data.

Imo this largely relies on what is in your training code and how you create batches.

The hidden state of the LSTM encodes what the model has seen so far during training.
So If you assume a certain window and an input to the LSTM would be a randomized batch of several unrelated windows you will need to re-initialize your hidden state each training batch.

If your batch size = 1 and you train without randomization and no windows (not sure how), you still need to initialize the hidden state at each epoch otherwise the LSTM will think itโ€™s a continuation of the previous epoch not a 2nd pass over the training data.