What happens when we dont put h0, c0 in LSTM, RNN etc

Khabbab_Zakaria · November 15, 2021, 10:06pm

I was playing with nn.LSTM, nn.RNN etc and I found something.
The website states:

rnn = nn.LSTM(10, 20, 2)
input = torch.randn(5, 3, 10)
h0 = torch.randn(2, 3, 20)
c0 = torch.randn(2, 3, 20)
output, (hn, cn) = rnn(input, (h0, c0))
print(output.shape)
>>> torch.Size([5, 3, 20])

And I tried the same thing without h0 and c0 as:

input = torch.randn(5, 3, 10)
rnn = LSTM(10, 20, 2)
output, _ = rnn(input)
print(output.shape)
>>> torch.Size([5, 3, 20])

I wonder what is happening here. Is the model by itself assuming h0,c0?

thecho7 · November 16, 2021, 1:15am

Hi, I got a small piece of document from PyTorch Docs.

h_0: tensor of shape (D * \text{num_layers}, N, H_{out})(D∗num_layers,N,Hout) containing the initial hidden state for each element in the batch. Defaults to zeros if (h_0, c_0) is not provided.
c_0: tensor of shape (D * \text{num_layers}, N, H_{cell})(D∗num_layers,N,Hcell) containing the initial cell state for each element in the batch. Defaults to zeros if (h_0, c_0) is not provided.

When you don’t define your h_0 and c_0, the model automatically create a zeros tensor according to your input size.

Thanks