I was going through this example - https://github.com/pytorch/examples/blob/master/dcgan/main.py and I have a basic question.
fake = netG(noisev) labelv = Variable(label.fill_(fake_label)) output = netD(fake.detach()) # detach to avoid training G on these labels errD_fake = criterion(output, labelv) errD_fake.backward() D_G_z1 = output.data.mean() errD = errD_real + errD_fake optimizerD.step()
I understand that why we call
detach() on variable
fake, so that no gradients are computed for the Generator parameters. My question is, does it matter since
optimizerD.step() is going to update the parameters associated with Discriminator only?
Besides, in the next step when we will update parameters for Generator, before that we will call
netG.zero_grad() which eventually removes all previously computed gradients. Moreover, when we update parameters for G network, we do this -
output = netD(fake). Here, we are not using detach. Why?
So, why detaching (line 3) is necessary in the above code?