Confusion about how loss.backwards() and optimzer.step() are related

gshartnett · February 4, 2019, 4:43pm

I’m new to PyTorch, and just starting to play around with it. I would like to be able to access the gradient of the loss function during the training of a neural network in order to do calculations on it. For example, it would be nice to monitor the norm of the gradient vector throughout an ADAM-based optimization.

I found some code online that trains a neural network (in this case a GAN, so there are actually two networks). The optimization step for the discriminator loss (d_loss) looks like

d_loss.backward()
optimizer_D.step()

and similarly for the generator. This results in the parameters being properly updated, and the model properly trains.

I’m confused because I can’t seem to access the gradient of d_loss. For example, d_loss.grad returns None. On the one hand, if I check d_loss.is_leaf, I find that it is False, and so the gradient should be None since this is not a leaf variable. On the other hand, I am confused as to why optimizer_D.step() works - where is it finding the gradient of d_loss?

Beyond understanding this puzzle, what do I need to change so that I can access d_loss.grad?

ezyang · February 4, 2019, 4:56pm

The “gradient” of a variable says how much it would change, if you perturbed the variable that you computed the backwards from (d_loss) in this case. d_loss doesn’t have any gradient; it’s the parameters that fed into it that get gradients.

The optimizer is able to find the gradients because it loops over the parameters of the module you are optimizing, which are defined to be the parameters that are accumulating gradients.