Why hidden_state in rnn module has a "batch" dimension?

tiantong · November 13, 2017, 12:35pm

Why dimension of hidden_state in rnn module is associated with “batch size”? If I train a lstm with batch size 10, then I test my model with batch size 1. How can I use pretrained hidden_state of lstm network?

dpernes · November 13, 2017, 12:42pm

What do you mean by “pretrained hidden state”?

The hidden state of an RNN (vanilla, LSTM, GRU or any other kind that you may like) is not supposed to “live forever”. In principle, it should be reset to an empty state (all zeros) after a whole sequence has been processed.

tiantong · November 13, 2017, 1:02pm

Thanks, I understand.