Maybe a starting point is https://pytorch.org/docs/stable/notes/autograd.html
Behind the scenes PyTorch tracks all operations on tensors with requires_grad == true
and builds a computation graph during the forward pass. It knowns how the loss value was calculated and can automatically back-propagate the gradient step by step from the loss (or any scalar model output) to the model parameters.