I put this code in SGD optimizer:
print('Parameter before update',p)
print('Slope', d_p)
print('X', X)
print('Learning rate',-group['lr'])
check = p.clone()
p.data.add_(-group['lr'], d_p)
print('Parameter of the tensor after update', p)
print('Sanity check', check +(-group['lr']) * d_p)
And I see that bias get updated:
Parameter before update Parameter containing:
tensor([[-2.0561, -4.0162, -6.9413]], requires_grad=True)
Slope tensor([[ -5.2444, -10.4888, -15.7332]])
X tensor([[0.1000, 0.2000, 0.3000]])
Learning rate -1
Parameter of the tensor after update Parameter containing:
tensor([[3.1883, 6.4726, 8.7918]], requires_grad=True)
Sanity check tensor([[3.1883, 6.4726, 8.7918]], grad_fn=<AddBackward0>)
Parameter before update Parameter containing:
tensor([-22.1307], requires_grad=True)
Slope tensor([-52.4439])
X tensor([[0.1000, 0.2000, 0.3000]])
Learning rate -1
Parameter of the tensor after update Parameter containing:
tensor([30.3132], requires_grad=True)
Sanity check tensor([30.3132], grad_fn=<AddBackward0>)
Particular interest:
tensor([-22.1307], requires_grad=True)
Parameter of the tensor after update Parameter containing:
tensor([30.3132], requires_grad=True)
I thought that f’(x)=wx+b w.r.t. to w is x and bias is not get updated because derivative of bias which is constant is zero.
Why?