When training GAN why do we not need to zero_grad discriminator?

(Orhan) #1

In the DCGAN example that can be found here, while training the generator network after training the discriminator network, we do not perform netG.zero_grad() again. However, doesn’t this accumulate the gradients with respect to real data in the netD (line 208), or the gradients with respect to the previous feeding of fake data (line 217)? Does the former not happen because the input tensor is different (real/fake), and does the latter not happen because we had performed a detach (line 215)?

1 Like

In the update step of the discriminator (line 208), the generator does not get the data, so the backward step does not calculate any gradients for it.
In line 217 the input to the discriminator is detached as you already observed. Thus the backward call of errD_fake also does not calculate the gradients for the generator.

Before updating the generator (line 225 etc.) the gradients are zeroed, so it looks alright.

(Orhan) #3

Thank you very much, this answer helped me understand the workings of autograd better.

(Ahmed Mamoud) #4

Just one more question regarding your answer: when optimizing the discriminator, the first call to “backward” function should save gradients which are accumulated to those ones calculated from the second call to “backward” when optimizing the generator … is this true?


While optimizing the discriminator you are using a real and fake input.
Both inputs will create a loss, which will accumulate gradients in the discriminator.
The fake input is detached from the generator (while updating the discriminator), so that the generator won’t see any gradients.

1 Like
(Ahmed Mamoud) #6

Yes, but what about the second call to backward function for optimizing the generator? my understanding is that there would be gradients from the first call (when optimizing the discriminator) and these are now added to the gradients of the second call … i.e., the generator would be optimized by gradients of the second backward call (the correct gradients) + the gradients from the first call (when the generator was detached).

If that’s true, can we apply “zero_grad” to the discriminator before using it to optimize the generator?


The generator update does not have a second backward call. Could you point me to the line of code?

While optimizing the discriminator, you won’t compute any gradients in the generator.
Since the fake input was detached from the generator, no gradients will be created in the generator itself.
You can check it by calling print(netG.some_layer.weight.grad) after the discriminator was updated (in the first iteration, otherwise you might see the gradients from the previous run).

1 Like
(Ahmed Mamoud) #8

I was referring to “errG.backward()” … Now it is totally clear, thanks so much!