Why hidden_state in rnn module has a "batch" dimension?

Why dimension of hidden_state in rnn module is associated with “batch size”? If I train a lstm with batch size 10, then I test my model with batch size 1. How can I use pretrained hidden_state of lstm network?

What do you mean by “pretrained hidden state”?

The hidden state of an RNN (vanilla, LSTM, GRU or any other kind that you may like) is not supposed to “live forever”. In principle, it should be reset to an empty state (all zeros) after a whole sequence has been processed.

Thanks, I understand.