LSTMCell Bias question

I read the documentation for LSTM cell:([https://pytorch.org/docs/master/generated/torch.nn.LSTMCell.html]

Noted that there are two biases required, namely:

  • ~LSTMCell.bias_ih – the learnable input-hidden bias, of shape (4*hidden_size)
  • ~LSTMCell.bias_hh – the learnable hidden-hidden bias, of shape (4*hidden_size)

Assuming that batch size is 1, and the size of input(x) and hidden(h) state is 128 and 64, then:

i(t) = [1, 128]
h(t) = [1, 64]

w(i) = [64, 128+64]
b(i) = [64]

I feel uncertain why we need two biases, to fit each set of activation (i.e. sigmoid and tanh) functions. As the shape of outputs must be equal to the hidden state. So one bias is enough for each set. why we need two biases for each set?

Also, for LSTM model, we have five activation functions, namely:

  1. forget gate (sigmoid)
  2. input gate (sigmoid)
  3. input gate (tanh)
  4. output gate (sigmoid)
  5. output gate (tanh)

The documentation mentions 4*, instead of 5*. Is it related to #1 to #4 above as #5 is not required for bias due to solely c(t).

Thanks for your patience for reading.

I guess the probable reason is:

output of activation function = activation function (i.e. sigmoid/tanh) (matrix addition (w(i) * i(t) + b(i), w(h) * h(t-1) + b(h)))

which is different from:

output of activation function = activation function (i.e. sigmoid/tanh) (w(i + t-1) * matrix concatenation (i(t), h(t-1)) + b(h))) [as referred by the formulas from colah’s blog

It’s for consistency reasons (see this).

1 Like