def init_hidden(self, bsz, requires_grad=True):

weight = next(self.parameters())

if self.rnn_type == ‘LSTM’:

return (weight.new_zeros((self.nlayers, bsz, self.nhid), requires_grad=requires_grad),

weight.new_zeros((self.nlayers, bsz, self.nhid), requires_grad=requires_grad))

else:

return weight.new_zeros((self.nlayers, bsz, self.nhid), requires_grad=requires_grad)

This code uses the device and data type of the first parameter (here called `weight`

) to initialize the hidden states as zeros (and the cell state for LSTM) and returns these tensors.

I’m not sure if this is the right place to ask but I’ve been studying LSTM, and I don’t understand how the hidden state can be hyper-parameter in an LSTM. Shouldn’t the hidden state parameter be fixed to the number of outputs x number of layers?

Thank you very much for your answer, now I understand

I don’t quite understand your problem, this is just to initialize the parameters of the hidden state, we also need to get the appropriate parameters through training