Why RNN needs two biases?

Sunnydreamrain · February 17, 2017, 9:56am

In the RNN implementation, there are two biases, b_ih and b_hh.
Why is this? Is it different from just use one bias?
Will it affect performance or efficiency?

Ismail_Elezi · February 17, 2017, 10:24am

You mean in general? For the same reason that it needs two sets of weights, one for the input and one from the previous state.

Sunnydreamrain · February 17, 2017, 10:33am

Taking RNN with tanh activation for example, it follows
h_t = tanh(w_{ih} * x_t + b_{ih} + w_{hh} * h_{(t-1)} + b_{hh})
b_{ih} and b_{hh} are just biases (trainable constants). It does not matter for input or previous state.
Actually, let b=b_{ih} + b_{hh},
h_t = tanh(w_{ih} * x_t + w_{hh} * h_{(t-1)} + b).
It should be the same.

apaszke · February 17, 2017, 12:54pm

As you pointed out it doesn’t really change the definition of the model, but this is what cuDNN does, so we’ve made our RNNs consistent with this behaviour.

Sunnydreamrain · February 17, 2017, 12:55pm

OKay. Good to know. Thanks.