Same grad for bias in input gate and hidden gate

Hi,
I am using a Bi-LSTM with hidden size=512, num_layers=1.
I am observing that the gradient of lstm.bias_ih_l0 and lstm.bias_hh_l0 are always same.
Same for lstm.bias_ih_l0_reverse and lstm.bias_hh_l0_reverse.

Is this natural?

Example:
embedding.weight tensor(-0.0028, device=‘cuda:0’)
stm.weight_ih_l0 tensor(-0.0399, device=‘cuda:0’)
lstm.weight_hh_l0 tensor(-2.7399, device=‘cuda:0’)
lstm.bias_ih_l0 tensor(0.0912, device=‘cuda:0’)
lstm.bias_hh_l0 tensor(0.0912, device=‘cuda:0’)
lstm.weight_ih_l0_reverse tensor(-0.0791, device=‘cuda:0’)
lstm.weight_hh_l0_reverse tensor(-1.5359, device=‘cuda:0’)
lstm.bias_ih_l0_reverse tensor(0.0834, device=‘cuda:0’)
lstm.bias_hh_l0_reverse tensor(0.0834, device=‘cuda:0’)
outputLayer.weight tensor(3.5665, device=‘cuda:0’)

Thanks!

This may or may not happen. Is this happening for all the inputs?
if you can supply with a reproducible script, it might be helpful to debug. If not, the answers would be more of a speculation than actual cause.