Confusion about loss.backward() and how it updates gradients


I’m an experienced programmer, familiar to neural nets (theoretically) but new to pytorch. I’m trying out pytorch in C++ and am currently writing the learning stage of my simple network (1 hidden layer).

I have an optimizer and have calculated a loss tensor, all according to the tutorial found here. The tutorial then calls d_loss_fake.backward(). I did some digging and this apparently updates the gradients of the network. I am confused however about how it achieves this, as I could not find any relations between the loss tensor and the network. Concretely: how does .backward() know which weights to update if there is no connection between the loss tensor and the network?

Sorry for the beginner question. I hope this isn’t a duplicate, I’ve been googling for a couple of hours now. Thanks in advance!

1 Like

Maybe a starting point is

Behind the scenes PyTorch tracks all operations on tensors with requires_grad == true and builds a computation graph during the forward pass. It knowns how the loss value was calculated and can automatically back-propagate the gradient step by step from the loss (or any scalar model output) to the model parameters.

1 Like