DCGAN Tutorial: Multiple backward calls

James_e · December 5, 2021, 10:25pm

I have a question regarding the .backward() calls in the DCGAN tutorial.

When training the discriminator, an error for the real data, errD_real, and an error for the fake data, errD_fake, is calculated. Each of these have .backward() called on them before the .step() method is called on the optimiser.

How does this work? Would the second .backward() call not overwrite the gradient values stored in .grad attribute of the optimiser?

Thanks for your help!

ptrblck · December 6, 2021, 1:06am

No, each backward call accumulates the gradients in the .grad attribute of the used parameters, which is also why you have to call optimizer/model.zero_grad() before calculating the gradients of a new iteration.

James_e · December 6, 2021, 1:41pm

Would this therefore be the same as calling (errD_real + errD_fake).backward()?

ptrblck · December 6, 2021, 9:14pm

Yes, the resulting gradient would be the same, but you would use more memory, since both computation graphs needs to be kept before the final backward() operation is called. If you call backward() separately, the intermediate tensors would be freed right afterwards. You would thus save memory but pay for it with two backward operations.