Why are there two bias terms in RNNCell when they are pointwise added together? Wouldn’t this be equivalent to using one bias and doubling its gradient? Although I’m not sure if doubling the gradient is the desired behavior…

1 Like

Yes, it’d be equivalent to just learn 1 bias term. I guess it’s just convention to learn two bias terms for an Elman cell (or we just implemented it exactly as the formula says, rather than thinking this through).

Here’s the relevant code that I double-checked https://github.com/pytorch/pytorch/blob/master/torch/nn/_functions/rnn.py#L14

1 Like