Understanding gradient descent in pytorch

Shaswat · January 25, 2022, 10:40am

Hi,
I have a doubt regarding gradient descent. The below function tires to approximate the value of w and b. After backpropagating the value for w.grad(w is initialized with 1.0) should have been a negative number but I am receiving high positive number.
A negative grad value indicates that w should be increased while a positive number number indicates that w should be reduced.
Am I missing something? Please help me resolve this query. Thank you.

import torch
x = torch.rand([20, 1], requires_grad=True)
y = 3*x - 2
w = torch.tensor([1.], requires_grad = True)
b = torch.tensor([1.], requires_grad = True)

y_hat = w*x + b

loss = torch.sum((y_hat - y)**2)
print(loss) # tensor(79.0468, grad_fn=<SumBackward0>)

loss.backward()
print(w.grad, b.grad) # tensor([35.0812]) tensor([76.0854])

mMagmer · January 25, 2022, 11:52am

If you drive mathematic for grad, you can see that nothing is wrong with the result.

grad_w = (2*(y_hat-y)*(x)).sum()
grad_b = (2*(y_hat-y)*(1)).sum()

import torch
x = torch.rand([20, 1], requires_grad=True)
y = 3*x - 2
w = torch.tensor([1.], requires_grad = True)
b = torch.tensor([1.], requires_grad = True)

y_hat = w*x + b

loss = torch.sum((y_hat - y)**2)
print(loss) 

loss.backward()
print(w.grad, b.grad) 

grad_w = (2*(y_hat-y)*(x)).sum()
grad_b = (2*(y_hat-y)*(1)).sum()

print(grad_w,grad_b)

output:
tensor(71.8559, grad_fn=)
tensor([37.8953]) tensor([73.1675])
tensor(37.8953, grad_fn=) tensor(73.1675, grad_fn=)