RNN: Is it advisable initialize hidden state in training only once

Is it better to initialize hidden state only once in a RNN training or initialize in every iteration of the training loop?
For example, the example code here initialize hidden state in every iteration:
https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial

def train(category_tensor, line_tensor):
    hidden = rnn.initHidden()

    rnn.zero_grad()

    for i in range(line_tensor.size()[0]):
        output, hidden = rnn(line_tensor[i], hidden)

    loss = criterion(output, category_tensor)
    loss.backward()

    # Add parameters' gradients to their values, multiplied by learning rate
    for p in rnn.parameters():
        p.data.add_(p.grad.data, alpha=-learning_rate)

    return output, loss.item()

## Training loop
for iter in range(1, n_iters + 1):
    category, line, category_tensor, line_tensor = randomTrainingExample()
    output, loss = train(category_tensor, line_tensor)
    current_loss += loss

For some other examples, the initialization happens outside the loop, and it keeps the previous value from the loop.

Think of the hidden state as the RNN’s “memory.” It’s like a running record of what the network has seen so far in a sequence. This memory is crucial because it allows the RNN to understand the context of the current input.
Imagine reading a sentence: “The cat sat on the…”. Your brain (like the hidden state) remembers “the cat sat on the” to predict the next word, maybe “mat” or “couch.” If you forgot everything after each word, you’d have no idea what was going on!
Initializing the hidden state at the start of each sequence is like starting a new sentence or paragraph. You need a fresh memory for each new piece of information.
Now, that code example might be doing something different, like:

  • Focusing on individual time steps: Perhaps it’s designed to analyze each input in isolation, without considering the surrounding context.
  • Illustrating a simplified scenario: It could be a basic example to demonstrate how RNNs work at a single time step level.
    But in most cases, you want to preserve the hidden state throughout a sequence to capture those all-important dependencies between time steps. It’s like remembering the whole sentence to understand its meaning, not just individual words.