Why LSTM has two bias parameters

genous110 · May 12, 2019, 10:11am

lstm = torch.nn.LSTM(10, 20,1)
lstm.state_dict().keys()

Output result：

Out[47]: odict_keys(['weight_ih_l0', 'weight_hh_l0', 'bias_ih_l0', 'bias_hh_l0'])

According to the calculation process of LSTM, there should be only one bias. Why do we output two bias variables, that is,‘bias_ih_l0’and’bias_hh_l0’?

Tony-Y · May 12, 2019, 12:08pm

https://pytorch.org/docs/stable/_modules/torch/nn/modules/rnn.html

It says that “Second bias vector is included for CuDNN compatibility. Only one bias vector is needed in standard definition.”

https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#cudnnRNNMode_t

sh0416 · June 10, 2021, 9:58am

I think two bias term acts differently.

The main point is that bias_ih is applied once during the computation along time axis, while bias_hh is applied accumulated along the time axis.

I want to clarify this one using illustrative example, but the process is so complicate.