Is LSTM Stateful between Inner Batches

danill · July 9, 2019, 10:23am

When giving the hidden tensor to the forward method in the LSTM, we give an initial vector for each batch, i.e. a tensor of size (num_laters, num_batches, len_hidden). Is there a way for the initial hidden of batch to be the last hidden of the former batch (this is for one forward run)?
I wrote a more detailed explanation in the comments

Prerna_Dhareshwar · July 9, 2019, 5:15pm

The output of the LSTM is o,(h,c) where o is the output of size (sequence, batch, hidden), and tuple (h,c) where h is the last hidden state. You can take this h and pass it as initial hidden state to next batch.

danill · July 14, 2019, 8:39am

Hi Prerna, thanks for answering!
But that wasn’t what I was asking… What I don’t get is what happens in a single run.
The way I understand it is this:
input is a tensor of size (num_cells, num_batches, len_inputs) i.e. an input vector for every cell of every batch.
hidden is a tensor of size (num_layers, num_batches, len_hidden) i.e. every batch in the run gets a hidden initial for every layer.
Is that correct?
I don’t want it to give it the initial hidden for every batch, but would like it to use the hidden from the previous batch.
I’d like it to do this without the loop:

len_inputs = ...
num_layers = ...
len_hidden = ...
h = (torch.zeros(num_layers, 1, len_hidden))
c = copy.deepcopy(h)

for i in range(len(subsequences)):  
    optimizer.zero_grad()
    out, h = model(subsequences[i], h)   
    loss = nn.WhateverLoss(out, subsequences[i])
    optimizer.step()

I don’t understand if that’s what happens under the hood when I tell it:

h = (torch.zeros(num_layers, num_subsequences, len_hidden))
c = copy.deepcopy(h)
out, h = model(subsequences)

And if not, how do I make it so?
Hope I made it clear, please tell me if not I’m having trouble with this for a while…
Thanks!

Prerna_Dhareshwar · July 16, 2019, 12:34am

Hi Danny,

input is a tensor of size (num_cells, num_batches, len_inputs) i.e. an input vector for every cell of every batch.

What do you mean by ‘cell’ here? The first dimension of input is sequence length, not sure what you mean by number of cells.

danill · July 16, 2019, 8:10am

Yes I mean the length of the sequence (subsequence to be more accurate) i.e. number of time steps:) The way I visualize it is with cells where every cell gets an input of the data point and the former cell’s hidden output.
I just want to be able to break a big sequence into little ones so that I can train it in pieces, but still have the memory (hidden) generated from the whole sequence

Prerna_Dhareshwar · July 16, 2019, 6:39pm

Hi Danny,

So say you have an input x of size (120, 1, 20) - (sequence, batch, input). Now you break it up into x1 with shape (60,1,20) and x2 with shape (60,1,20). Then to retain the hidden state from the end of x1 to the beginning of x2, you would do something like this-

model = nn.LSTM(input_size = 20, hidden_size = h_size)
out1, (h1,c1) = model(x1)
out2, (h2,c2) = model(x2, (h1,c1))

Does this answer your question?

danill · July 21, 2019, 8:45am

Nope:)
I would like to train it in one line:
I want the input to be of shape (60, 2, 20), which is a tensor of the connected x1 and x2 each being a batch.
And I want the run to be

out = model(x)

This will not produce an error, but I need to give an initial (h, c) for every one of the 2 batches. I want to give it only one (h,c) for the first batch, and have it use the output of the first batch as the initial hidden for the second batch. I think it will save computation time.
Am I clearer this time?
Thanks!

Zio_Belo · August 3, 2020, 7:04pm

Hi,
did you manage to solve this problem then?
i am trying as well to be have a stateful lstm between different sequences passing the training data in one go.

motua16 · June 20, 2021, 4:48am

@danill , kindly let us know if you were able to solve this.

Bourne_Hunting · May 7, 2022, 9:58am

Do you have any solution to get inner stateful LSTM for the batch_size>1 ?