Confusion about Multi-layer Bidirectional LSTM

Yupei_Zhu · July 28, 2018, 7:04pm

Hi,

I am confused about the implementation of multi-layer bidirectional LSTM in pytorch.

Say we have
model1 = nn.LSTM(input_size=2, hidden_size=3, num_layers=2, bidirectional=True)

model1 would be a 2-layer bidirectional lstm. For the first layer, since the hidden size is 3 and it is bidirectional, the output of first layer will have size of 6. Therefore the input size of second layer should be 6 and output size should be 6*2=12.

If i do:
for name,para in model1.named_parameters():
print(name)
print(para.size())
It prints:
weight_ih_l0
torch.Size([12, 2])
weight_hh_l0
torch.Size([12, 3])
bias_ih_l0
torch.Size([12])
bias_hh_l0
torch.Size([12])
weight_ih_l0_reverse
torch.Size([12, 2])
weight_hh_l0_reverse
torch.Size([12, 3])
bias_ih_l0_reverse
torch.Size([12])
bias_hh_l0_reverse
torch.Size([12])
weight_ih_l1
torch.Size([12, 6])
weight_hh_l1
torch.Size([12, 3])
bias_ih_l1
torch.Size([12])
bias_hh_l1
torch.Size([12])
weight_ih_l1_reverse
torch.Size([12, 6])
weight_hh_l1_reverse
torch.Size([12, 3])
bias_ih_l1_reverse
torch.Size([12])
bias_hh_l1_reverse
torch.Size([12])

The output matches my expectation. However, when I really feed the network with data, it gives me outputs having size of 6:
from torch.autograd import Variable
random_input = Variable(torch.FloatTensor(5, 1, 2).normal_(), requires_grad=False)
out, _ = model1(random_input)
out.size()

The code gives torch.Size([5, 1, 6])

Could someone please explain why that is the case? Thanks in advance!

Guangjun_Zhao · August 21, 2018, 12:54am

it"s right, the “12” is not the output size,
- x: Input data of shape (N, T, D)
- h0: Initial hidden state of shape (N, H)
- Wx: Weights for input-to-hidden connections, of shape (D, 4H)
- Wh: Weights for hidden-to-hidden connections, of shape (H, 4H)
- b: Biases of shape (4H,)