I understand that for RNN, we should set the hidden state to be zero. But I do see codes to initialize the hidden state every epoch, instead of every batch (to be clear, for each epoch, there will be multiple batches).
For example, in the following code, it initializes the hidden state for each epoch, instead of each batch, is there any specific reason? Or my initial understanding is wrong?
# train for some number of epochs
for e in range(epochs):
# initialize hidden state
h = net.init_hidden(batch_size)
# batch loop
for inputs, labels in train_loader:
# Creating new variables for the hidden state, otherwise
# we'd backprop through the entire training history
h = tuple([each.data for each in h])
# zero accumulated gradients
net.zero_grad()
# get the output from the model
output, h = net(inputs, h)
# calculate the loss and perform backprop
loss = criterion(output.squeeze(), labels.float())
loss.backward()
# `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
nn.utils.clip_grad_norm_(net.parameters(), clip)
optimizer.step()