According to Pytorch LSTM documentation :-
- ~LSTM.weight_ih_l[k] – the learnable input-hidden weights of the kth\text{k}^{th}kth layer (W_ii|W_if|W_ig|W_io), of shape (4*hidden_size, input_size) for k = 0. Otherwise, the shape is (4 * hidden_size, num_directions * hidden_size)
My doubt is, why for k > 0
the shape for each weight is (hidden_size, num_directions * hidden_size)
, according to me, shouldn’t be (hidden_size, num_directions * proj_size)
because the layer above the lowest layer is receiving the input which is the output of the lowest layer which have the shape of (L, N, num_directions*proj_size)