I was going through this example - https://github.com/pytorch/examples/blob/master/dcgan/main.py and I have a basic question.
fake = netG(noisev)
labelv = Variable(label.fill_(fake_label))
output = netD(fake.detach()) # detach to avoid training G on these labels
errD_fake = criterion(output, labelv)
errD_fake.backward()
D_G_z1 = output.data.mean()
errD = errD_real + errD_fake
optimizerD.step()
I understand that why we call detach()
on variable fake
, so that no gradients are computed for the Generator parameters. My question is, does it matter since optimizerD.step()
is going to update the parameters associated with Discriminator only?
Besides, in the next step when we will update parameters for Generator, before that we will call netG.zero_grad()
which eventually removes all previously computed gradients. Moreover, when we update parameters for G network, we do this - output = netD(fake)
. Here, we are not using detach. Why?
So, why detaching (line 3) is necessary in the above code?