Why is the hidden state initialized to zero for every batch when doing forwad pass?

Thabang_Lukhetho · November 1, 2020, 1:09pm

I have a couple of questions:

Why is the hidden state initialized to zero for each batch when doing a forward pass? And when are you supposed to this during training or testing or both?
If you initialized the hidden state with a batch_size=128 during training and for testing the batch_size=1, would I have to initialize the hidden state to 0 with a new batch size?

def forward(self, x):
    batch_size = x.shape[0]
    hidden = (torch.zeros(self.layers, batch_size, self.hidden_size).to(device=device), torch.zeros(self.layers, batch_size, self.hidden_size).to(device=device))

    output, hidden = lstm(x, hidden)
    
    # then do what every you want with the output`

Abhilash_Srivastava · November 1, 2020, 6:11pm

You can do it for each epoch too (instead of each batch), which I think is better. You’ll need to do it for both training and testing (and validation).

Yes, the initialization should match the batch_size of the input.

hidden should be passed as the input the forward function, not initialized in the here. Check out an example here.

Thabang_Lukhetho · November 8, 2020, 11:07am

Thank this helped as lot