Hi,
I am using a Bi-LSTM with hidden size=512, num_layers=1.
I am observing that the gradient of lstm.bias_ih_l0 and lstm.bias_hh_l0 are always same.
Same for lstm.bias_ih_l0_reverse and lstm.bias_hh_l0_reverse.
Is this natural?
Example:
embedding.weight tensor(-0.0028, device=‘cuda:0’)
stm.weight_ih_l0 tensor(-0.0399, device=‘cuda:0’)
lstm.weight_hh_l0 tensor(-2.7399, device=‘cuda:0’)
lstm.bias_ih_l0 tensor(0.0912, device=‘cuda:0’)
lstm.bias_hh_l0 tensor(0.0912, device=‘cuda:0’)
lstm.weight_ih_l0_reverse tensor(-0.0791, device=‘cuda:0’)
lstm.weight_hh_l0_reverse tensor(-1.5359, device=‘cuda:0’)
lstm.bias_ih_l0_reverse tensor(0.0834, device=‘cuda:0’)
lstm.bias_hh_l0_reverse tensor(0.0834, device=‘cuda:0’)
outputLayer.weight tensor(3.5665, device=‘cuda:0’)
Thanks!