LSTM how to remember hidden and cell states across different batches?

HI guys,

I was wondering how to remember the hidden and cell states of the LSTM layer across a number of batches. In other words, the last hidden and cell states obtained after training with batch 1 can be used as the initial hidden and cell states for training with batch 2, and so on.

In Luatorch, I think I can use something like model:remember('both'). Is there a similar way to do this in Pytorch? Thank you very much for your help in advance.


1 Like

The LSTM module returns two things each time you feed it data. 1. The output. 2. The hidden state. Like this…

# initialise LSTM layer
lstm = nn.LSTM(3, 3)
# initialise inputs and hidden states
inputs =, 1, -1)
hidden = (autograd.Variable(torch.randn(1, 1, 3)), autograd.Variable(
    torch.randn((1, 1, 3))))
# feed the LSTM some data
out, hidden = lstm(inputs, hidden)

So basically it is up to your code to keep or reset the hidden/cell states between batches.

There is a tutorial with more details

ok I understand. In your example, you initialize the hidden and cell states. But do you know the default behaviour of the LSTM layer? Does it keep the hidden and cell states between batches by default? For instance,

After running: LSTM.forward(batch1), will the hidden and cell states be saved in the model, so that they will be used as initial hidden states in: LSTM.forward(batch2)? Is this the default behaviour?

If this is the default behaviour, then when we want to reset hidden states between batches, we need to set hidden, cell = 0 in model forward definition. If this is not the default behaviour, then when we want to remember hidden states between batches, we need to store the last step hidden states from previous batch and apply them as initial hidden state values in the current batch.

I was unaware that you could call LSTM.forward without providing the hidden states.
According to the source code, if you don’t provide the hidden states, then it creates new blank ones and doesn’t store them in between batches.

Just so we are clear, in the example I gave, the variable hidden contains both the hidden and cell states.

Yes it is, you can call LSTM without prividing hidden states, and in this case, LSTM will use zeros as initial values for both hidden and cell states. This means that I need to specifically store and reapply hidden and cell states within the model forward definition in order to keep these states between batches, otherwise, these states will just be reset after each batch.

I understand now and thanks a lot for the help.