Confusion about loss.backward() and how it updates gradients

Maybe a starting point is https://pytorch.org/docs/stable/notes/autograd.html

Behind the scenes PyTorch tracks all operations on tensors with requires_grad == true and builds a computation graph during the forward pass. It knowns how the loss value was calculated and can automatically back-propagate the gradient step by step from the loss (or any scalar model output) to the model parameters.

1 Like