When to initialize RNN hidden state? Every epoch or every batch?

I understand that for RNN, we should set the hidden state to be zero. But I do see codes to initialize the hidden state every epoch, instead of every batch (to be clear, for each epoch, there will be multiple batches).

For example, in the following code, it initializes the hidden state for each epoch, instead of each batch, is there any specific reason? Or my initial understanding is wrong?

# train for some number of epochs
for e in range(epochs):
    # initialize hidden state
    h = net.init_hidden(batch_size)

    # batch loop
    for inputs, labels in train_loader:
        # Creating new variables for the hidden state, otherwise
        # we'd backprop through the entire training history
        h = tuple([each.data for each in h])

        # zero accumulated gradients
        net.zero_grad()

        # get the output from the model
        output, h = net(inputs, h)

        # calculate the loss and perform backprop
        loss = criterion(output.squeeze(), labels.float())
        loss.backward()
        # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
        nn.utils.clip_grad_norm_(net.parameters(), clip)
        optimizer.step()
3 Likes