Why is the hidden state initialized to zero for every batch when doing forwad pass?

I have a couple of questions:

  1. Why is the hidden state initialized to zero for each batch when doing a forward pass? And when are you supposed to this during training or testing or both?
  2. If you initialized the hidden state with a batch_size=128 during training and for testing the batch_size=1, would I have to initialize the hidden state to 0 with a new batch size?
def forward(self, x):
    batch_size = x.shape[0]
    hidden = (torch.zeros(self.layers, batch_size, self.hidden_size).to(device=device), torch.zeros(self.layers, batch_size, self.hidden_size).to(device=device))

    output, hidden = lstm(x, hidden)
    
    # then do what every you want with the output`

You can do it for each epoch too (instead of each batch), which I think is better. You’ll need to do it for both training and testing (and validation).

Yes, the initialization should match the batch_size of the input.

hidden should be passed as the input the forward function, not initialized in the here. Check out an example here.

Thank this helped as lot