RNN(LSTM, GRU) hidden states

dim314159 · January 26, 2022, 6:53pm

I’ve seen 2 ways to use hidden states.
First way:
in class:
self.rnn = nn.rnn(…)
def forward(self, x, h):
out, h = self.rnn(x,h)
return out, h

In training:
for … epochs:
h = torch.zeros(num_layers, batch_size, hidden_size)
…
for … batch loop:
out, h = model(x, h)
…
h.detach_()

Second way:
In class:
self.rnn = nn.rnn(…)
def weight_init():
self.h = torch.zeros(num_layers, batch_size, hidden_size)
def forward(x):
out, self.h = self.rnn(x, self.h)
return out

In training:
for epochs:
for batch:
model.weight_init()
out = model(x)
…

So, which way is correct, or what is the difference?
And also in the second way, should I do weight_init for every batch or for every epoch?
Thank you.