The following is a quote from this Github issue: https://github.com/pytorch/pytorch/issues/5349, about the ability to automatically detect variables for which a gradient is not needed. The quote:
Suppose we are training a GAN, with a very simple generator G(z) = tanh(W₁z + b₁) and discriminator D(x) = W₂x + b₂. The loss is a function of D(G(z)), and when calling loss.backward(), all of the gradients in the network are computed. However, during GAN training, the generator and discriminator are trained separately in alternation, therefore
When training D, the gradients do not need to back propagate to G.
When training G, we need only ∂D/∂x, but not ∂D/∂W₂ or ∂D/∂b₂.
The Github issue is old and there have been some API improvements as far as I can tell that help to mitigate this issue. I just started reading the PyTorch tutorials, and was wondering if this issue could be relevant in this section of the docs: https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html. Since this issue is not mentioned explicitly in the tutorial, I want to ask: The code from the tutorial is computing the unnecessary gradients, or not?