I am looking at the example at the bottom of this page:
rnn = nn.LSTM(10, 20, 2) input = torch.randn(5, 3, 10) h0 = torch.randn(2, 3, 20) c0 = torch.randn(2, 3, 20) output, (hn, cn) = rnn(input, (h0, c0))
If I understand correctly:
- the LSTM has an input dimension of 10, a hidden dimension of 20 and 2 layers;
- the input is a batch of size 5, each item being a sequence of three elements of size 10 (i.e. the input dimension of the LSMT);
Am I correct?
If I am, why do I need three hidden initial tensors (h0)?