Why detach needs to be called to avoid parameter updates on Generator?

wasiahmad · October 26, 2017, 1:44am

I was going through this example - https://github.com/pytorch/examples/blob/master/dcgan/main.py and I have a basic question.

fake = netG(noisev)
labelv = Variable(label.fill_(fake_label))
output = netD(fake.detach()) # detach to avoid training G on these labels
errD_fake = criterion(output, labelv)
errD_fake.backward()
D_G_z1 = output.data.mean()
errD = errD_real + errD_fake
optimizerD.step()

I understand that why we call detach() on variable fake, so that no gradients are computed for the Generator parameters. My question is, does it matter since optimizerD.step() is going to update the parameters associated with Discriminator only?

Besides, in the next step when we will update parameters for Generator, before that we will call netG.zero_grad() which eventually removes all previously computed gradients. Moreover, when we update parameters for G network, we do this - output = netD(fake). Here, we are not using detach. Why?

So, why detaching (line 3) is necessary in the above code?

WERush · October 26, 2017, 3:32am

In my opinion, it is ok to use output = netD(fake) without .detach(). I have found some code that does not use the detach() during training the D, for example, discoGAN.

I suppose that the reason why using detach() is the efficiency.

wasiahmad · October 26, 2017, 3:55am

I did guess the same thing but I am not sure. Specifically, when we update parameters for G network, we do this - output = netD(fake). I think here we are not detaching fake (like we did in D network) because in that case the whole graph for G network will be broken.