On the following very simple example, why the hidden state consists of 2 Tensors? From what I understand, isn’t it supposed to be just a Tensor of size 20?
import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
rnn = nn.LSTM(input_size=10, hidden_size=20)
input = Variable(torch.randn(50, 1, 10))#seq_len x batch x input_size
output, hn = rnn(input)
print (hn)