Why give empty internal state for LSTM in each evaluation in pytorch tutorial

koodailar · August 6, 2017, 10:52am

I am now reading the pytorch tutorial on name/country classification (see the Classifying Names with a Character-Level RNN). After the training, in chapter evaluate result, the evaluate() function will take a name and predict which language it belongs too. But it will give a hidden input of all zero as the input. My question is, we already have a trained internal state in the training, shall we always reuse it, or discard it every time? Or what is the recommended treatment of the internal state among different evaluation?

# Just return an output given a line
def evaluate(line_tensor):
    hidden = rnn.initHidden()

    for i in range(line_tensor.size()[0]):
        output, hidden = rnn(line_tensor[i], hidden)

    return output

Thanks

dpernes · August 7, 2017, 11:46am

Hi @koodailar,

(…) we already have a trained internal state in the training (…)

Why do you say so? Have you trained the initial state of the RNN as a model parameter?

In most applications, the internal state is reset (to zero) after each sequence. Having an internal state is useful because it may be used to store information from the “past” that might be useful in the “future”, i.e. information from the previous inputs that conditions the probability distribution of the next outputs. In your example, each input sequence (surname) is completely independent from the following inputs, so it makes sense to reset the state to zero in order to flush the memory.