I want to ask a question about the shape of attributes in LSTM.
Anybody who can explain why the size of variable is multiply of 4 * hidden_size ?
For example, weight_ih_l[k] : the learnable input-hidden weights of the :math:
(W_ii|W_if|W_ig|W_io), of shape
**(4*hidden_size x input_size)**
I’m going to point you here: https://medium.com/mlreview/understanding-lstm-and-its-diagrams-37e2f46f1714 for the diagram.
The main idea is that we want the
i, f, g, o gates to all be of size
hidden_size. The math works out when the weight is
(4*hidden_size, input_size) because the output of the matrix multiplication is split into 4 pieces.
You can have a look at how LSTMCell is implemented in the file _functions/rnn.py
You can see how the gates are computed in the line 32.
gates = F.linear(input, w_ih, b_ih) + F.linear(hx, w_hh, b_hh)
Here, weights are of shape (4*hidden_size x input_size), (4*hidden_size x hidden_size) and biases are of shape (4*hidden_size). The result of linear function will be of shape (4*hidden_size). Then, there comes the splitting part in line 34.
ingate, forgetgate, cellgate, outgate = gates.chunk(4, 1)
Then each gate will be of shape (hidden_size).
Split is the point I missed. Good explanation, thank you.