Simple understanding of backprop / gradient descent

snip3r77 · March 18, 2020, 4:24am

Please let me know if my understanding is correct

We requires_grad= True for a is because we want to optimize this parameter.
We loss.backward() because we want to minimize loss to find a

thanks

a = torch.tensor([-5.,5.],requires_grad=True)
def update():
#calculate y_hat
y_hat = x@a
#calculate loss
loss = mse(y, y_hat)
#print loss every 10 loops
if t % 10 == 0: print(loss)
#compute the derivatives, you can call .backward()
loss.backward()
#To prevent tracking history (and using memory), you can also wrap the code block in with torch.no_grad():.
with torch.no_grad():
a.sub_(lr * a.grad) # w(t) = w(t-1) - lr dL/dw(t-1)
a.grad.zero_()

full code
https://i.imgur.com/UCqEGal.png

albanD · March 18, 2020, 3:12pm

It’s mostly correct yes, I would say it slightly differently:

a.requires_grad=True means that we will ask for gradients later and so the autograd should save everything that it needs to be able to do so.
a.is_leaf=True means that it was not computed in a differentiable manner (either created by the user or from Tensors that don’t require gradients). For such Tensor, if we compute their gradient, it should be saved in a.grad.
loss.backward() means compute the gradients for all the Tensor that require gradients that were used to create loss. By side effect, this will populate the .grad field of all the leafs that require gradients.
optimizer.step() is then used to update a using a.grad (note that a.grad gives a direction of descent for the loss above).

snip3r77 · March 19, 2020, 2:24am

Can you please explain a.is_leaf= True as I did not use this.
How do I incorporate to my code?

Also, what do you mean by optimizer.step() ?

with torch.no_grad():
# does a.optimizer.step() replaces this line below ?
a.sub_(lr * a.grad) # w(t) = w(t-1) - lr dL/dw(t-1)
    a.grad.zero_()

Thanks.

albanD · March 19, 2020, 3:55pm

Can you please explain a.is_leaf= True as I did not use this.

A Tensor is a leaf if it does not have any “history” in the autograd sense. Meaning that we cannot propagate gradients from this Tensor back to its parents.
There are two main cases here:

leaf that does not require gradient mean a Tensor for which gradients are not tracked.
leaf that requires gradient mean that the user explicitly asked for gradients here and so the .grad field will be populated when you call .backward().

How do I incorporate to my code?

You don’t need to It is just in case you want to look in more details, you can check if a Tensor is a leaf or not.

Also, what do you mean by optimizer.step() ?

Yes exactly, we have builtin optimizer to do these kind of updates, but you can do them by hand as well.