Clarification on LSTM

I am looking at the example at the bottom of this page:

rnn = nn.LSTM(10, 20, 2)
input = torch.randn(5, 3, 10)
h0 = torch.randn(2, 3, 20)
c0 = torch.randn(2, 3, 20)
output, (hn, cn) = rnn(input, (h0, c0))

If I understand correctly:

  1. the LSTM has an input dimension of 10, a hidden dimension of 20 and 2 layers;
  2. the input is a batch of size 5, each item being a sequence of three elements of size 10 (i.e. the input dimension of the LSMT);
    Am I correct?
    If I am, why do I need three hidden initial tensors (h0)?


Per the docs unless you specify batch_first=True, the input to the LSTM is shaped as (sequence length, batch size, number of input features), so in this case the sequence length is 5, the batch size is 3, and the number of input features is 10.

h0 represents the initial hidden state used when processing each input, so you need one for each element in the input sequence. It’s shaped as (number of lstm layers, batch size, number of output features) which in this case is precisely (2, 3, 20).

Hope this helps!