I want to ask a question about the shape of attributes in LSTM.
Anybody who can explain why the size of variable is multiply of 4 * hidden_size ?
For example, weight_ih_l[k] : the learnable input-hidden weights of the :math:\text{k}^{th} layer (W_ii|W_if|W_ig|W_io), of shape **(4*hidden_size x input_size)**
The main idea is that we want the i, f, g, o gates to all be of size hidden_size. The math works out when the weight is (4*hidden_size, input_size) because the output of the matrix multiplication is split into 4 pieces.
Here, weights are of shape (4*hidden_size x input_size), (4*hidden_size x hidden_size) and biases are of shape (4*hidden_size). The result of linear function will be of shape (4*hidden_size). Then, there comes the splitting part in line 34.