When we define the forward process, do we need to setup autograd for any tensors, such as hidden/cell state?
For example,
h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
In some tutorials, I found out from the beginning to the end not even one torch is set as requires_grad.
Is it because in nn.LSTM module it already handles this problem?