Why LSTM has two bias parameters

lstm = torch.nn.LSTM(10, 20,1)

Output result:

Out[47]: odict_keys(['weight_ih_l0', 'weight_hh_l0', 'bias_ih_l0', 'bias_hh_l0'])

According to the calculation process of LSTM, there should be only one bias. Why do we output two bias variables, that is,‘bias_ih_l0’and’bias_hh_l0’?


It says that “Second bias vector is included for CuDNN compatibility. Only one bias vector is needed in standard definition.”


I think two bias term acts differently.

The main point is that bias_ih is applied once during the computation along time axis, while bias_hh is applied accumulated along the time axis.

I want to clarify this one using illustrative example, but the process is so complicate.