I have a couple of questions:
- Why is the hidden state initialized to zero for each batch when doing a forward pass? And when are you supposed to this during training or testing or both?
- If you initialized the hidden state with a batch_size=128 during training and for testing the batch_size=1, would I have to initialize the hidden state to 0 with a new batch size?
def forward(self, x):
batch_size = x.shape[0]
hidden = (torch.zeros(self.layers, batch_size, self.hidden_size).to(device=device), torch.zeros(self.layers, batch_size, self.hidden_size).to(device=device))
output, hidden = lstm(x, hidden)
# then do what every you want with the output`