I have a question regarding the
.backward() calls in the DCGAN tutorial.
When training the discriminator, an error for the real data,
errD_real, and an error for the fake data,
errD_fake, is calculated. Each of these have
.backward() called on them before the
.step() method is called on the optimiser.
How does this work? Would the second
.backward() call not overwrite the gradient values stored in
.grad attribute of the optimiser?
Thanks for your help!
backward call accumulates the gradients in the
.grad attribute of the used parameters, which is also why you have to call
optimizer/model.zero_grad() before calculating the gradients of a new iteration.
Would this therefore be the same as calling
(errD_real + errD_fake).backward()?
Yes, the resulting gradient would be the same, but you would use more memory, since both computation graphs needs to be kept before the final
backward() operation is called. If you call
backward() separately, the intermediate tensors would be freed right afterwards. You would thus save memory but pay for it with two