I understand that for RNN, we should set the hidden state to be zero. But I do see codes to initialize the hidden state every epoch, instead of every batch (to be clear, for each epoch, there will be multiple batches).
For example, in the following code, it initializes the hidden state for each epoch, instead of each batch, is there any specific reason? Or my initial understanding is wrong?
# train for some number of epochs for e in range(epochs): # initialize hidden state h = net.init_hidden(batch_size) # batch loop for inputs, labels in train_loader: # Creating new variables for the hidden state, otherwise # we'd backprop through the entire training history h = tuple([each.data for each in h]) # zero accumulated gradients net.zero_grad() # get the output from the model output, h = net(inputs, h) # calculate the loss and perform backprop loss = criterion(output.squeeze(), labels.float()) loss.backward() # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs. nn.utils.clip_grad_norm_(net.parameters(), clip) optimizer.step()