Shape of LSTM variables

thecho7 · March 28, 2018, 12:24pm

Greetings,

I want to ask a question about the shape of attributes in LSTM.
Anybody who can explain why the size of variable is multiply of 4 * hidden_size ?

For example, weight_ih_l[k] : the learnable input-hidden weights of the :math:\text{k}^{th} layer
(W_ii|W_if|W_ig|W_io), of shape **(4*hidden_size x input_size)**

Thank you.

richard · March 28, 2018, 1:56pm

I’m going to point you here: https://medium.com/mlreview/understanding-lstm-and-its-diagrams-37e2f46f1714 for the diagram.

The main idea is that we want the i, f, g, o gates to all be of size hidden_size. The math works out when the weight is (4*hidden_size, input_size) because the output of the matrix multiplication is split into 4 pieces.

ZetilenZoe · March 28, 2018, 2:22pm

You can have a look at how LSTMCell is implemented in the file _functions/rnn.py

You can see how the gates are computed in the line 32.

gates = F.linear(input, w_ih, b_ih) + F.linear(hx, w_hh, b_hh)

Here, weights are of shape (4*hidden_size x input_size), (4*hidden_size x hidden_size) and biases are of shape (4*hidden_size). The result of linear function will be of shape (4*hidden_size). Then, there comes the splitting part in line 34.

ingate, forgetgate, cellgate, outgate = gates.chunk(4, 1)

Then each gate will be of shape (hidden_size).

thecho7 · March 29, 2018, 2:03am

Helpful link! Thanks:)

thecho7 · March 29, 2018, 2:04am

Split is the point I missed. Good explanation, thank you.