I’ve seen 2 ways to use hidden states.

First way:

in class:

self.rnn = nn.rnn(…)

def forward(self, x, h):

out, h = self.rnn(x,h)

return out, h

In training:

for … epochs:

h = torch.zeros(num_layers, batch_size, hidden_size)

…

for … batch loop:

out, h = model(x, h)

…

h.detach_()

Second way:

In class:

self.rnn = nn.rnn(…)

def weight_init():

self.h = torch.zeros(num_layers, batch_size, hidden_size)

def forward(x):

out, self.h = self.rnn(x, self.h)

return out

In training:

for epochs:

for batch:

model.weight_init()

out = model(x)

…

So, which way is correct, or what is the difference?

And also in the second way, should I do weight_init for every batch or for every epoch?

Thank you.