-
optimizer.step
is performs a parameter update based on the current gradient (stored in.grad
attribute of a parameter) and the update rule. As an example, the update rule for SGD is defined here:
https://github.com/pytorch/pytorch/blob/cd9b27231b51633e76e28b6a34002ab83b0660fc/torch/optim/sgd.py#L63. -
Calling
.backward()
mutiple times accumulates the gradient (by addition) for each parameter. This is why you should calloptimizer.zero_grad()
after each.step()
call. Note that following the first.backward
call, a second call is only possible after you have performed another forward pass.
So for your first question, the update is not the based on the “closest” call but on the .grad
attribute. How you calculate the gradient is upto you.