In LSTM, why should I reset hidden variables?

Hello, everyone.

I am running LSTM for multivariate time series data

At first, I need to make data
from
seq1 = array([10, 20, 30, 40, 50, 60, 70, 80, 90])
seq2 = array([15, 25, 35, 45, 55, 65, 75, 85, 95])
seq3 = array([65, 85, 105, 125, 145, 165, 185])

to
Input Output
[[10 15]
[20 25]
[30 35]] 65
[[20 25]
[30 35]
[40 45]] 85
[[30 35]
[40 45]
[50 55]] 105
[[40 45]
[50 55]
[60 65]] 125
[[50 55]
[60 65]
[70 75]] 145
[[60 65]
[70 75]
[80 85]] 165
[[70 75]
[80 85]
[90 95]] 185

I used code as below
I don’t understand why I should reset hidden variables each time?
I want to preserve and update hidden variables,
then what should I do for the codes below?
Thank you in advance

class LSTMModel(torch.nn.Module):
    def __init__(self,n_features,seq_length):
        super(LSTMModel, self).__init__()
        self.n_features = n_features
        self.seq_len = seq_length
        self.n_hidden = n_features # number of hidden states
        self.n_layers = 1 # number of LSTM layers (stacked)
    
        self.l_gru = torch.nn.LSTM(input_size = n_features, 
                                 hidden_size = self.n_hidden,
                                 num_layers = self.n_layers, 
                                 batch_first = True)
        self.l_linear = torch.nn.Linear(self.n_hidden*self.seq_len, 1)
        self.hidden = self.init_hidden(batch_size)
        

    def init_hidden(self, batch_size):
        # even with batch_first = True this remains same as docs
        hidden_state = torch.zeros(self.n_layers,batch_size,self.n_hidden)
        cell_state = torch.zeros(self.n_layers,batch_size,self.n_hidden)
        return (hidden_state, cell_state)
    
    
    def forward(self, x):        
        batch_size, seq_len, _ = x.size()
        lstm_out, self.hidden = self.l_gru(x,self.hidden)
        x = lstm_out.contiguous().view(batch_size,-1)
        return self.l_linear(x)


mv_net.train()
for t in range(train_episodes):
    for b in range(0,len(X),batch_size):
        inpt = X[b:b+batch_size,:,:]
        target = y[b:b+batch_size]    
        
        x_batch = torch.tensor(inpt,dtype=torch.float32)    
        y_batch = torch.tensor(target,dtype=torch.float32)
    
        #mv_net.init_hidden(x_batch.size(0))
        output = mv_net(x_batch) 
        loss = criterion(output.view(-1), y_batch)
        
        loss.backward()
        optimizer.step()        
        optimizer.zero_grad() 
    print('step : ' , t , 'loss : ' , loss.item())

It makes sense to reset the hidden state when you are working with instances or batches that are not related in any meaningful way (to make predictions) e.g. translating two different input instances in neural translation. You can think of the hidden state as limited memory that gets convoluted if the input is too long (and it can be if you combine multiple instances) and, as end result, the final peformance may decline.

One more thing, when performing SGD, you assume batches are independent of one another. If you don’t reset the hidden state between them, you lose the i.i.d. assumption.

1 Like

Dear mariosasko:

Thank you for your kind explanation.
I understand what you mean (a little bit).
I will read what you mentioned several times in detail.
Have a nice day and see you, mariosasko.