Why not set `param.requires_grad = False` in D network when training G network in DCGAN?

John1231983 · July 19, 2018, 4:02am

In the DCGAN example, we will train the D network first with the fake and real label. After that, we will train the G network, and do not update the D network. Why didn’t the DCGAN code set requires_grad = False when training the G network? This is what I think

        for param in D.parameters():
             param.requires_grad = True
        # train with real
        ...
        # train with fake  
        ...
        #-------Update G network------------
        # Do not update D network
        for param in D.parameters():
             param.requires_grad = False
        netG.zero_grad()       
        output = netD(fake)
        errG = criterion(output, label)
        errG.backward()
        optimizerG.step()

chenglu · July 19, 2018, 4:08am

when training D, the input have been detached, it a data only not a Variable anymore.

github.com

pytorch/examples/blob/master/dcgan/main.py#L215




output = netD(real_cpu)
errD_real = criterion(output, label)
errD_real.backward()
D_x = output.mean().item()


# train with fake
noise = torch.randn(batch_size, nz, 1, 1, device=device)
fake = netG(noise)
label.fill_(fake_label)
output = netD(fake.detach())
errD_fake = criterion(output, label)
errD_fake.backward()
D_G_z1 = output.mean().item()
errD = errD_real + errD_fake
optimizerD.step()


############################
# (2) Update G network: maximize log(D(G(z)))
###########################
netG.zero_grad()

John1231983 · July 19, 2018, 4:12am

But my question is training G.

chenglu · July 19, 2018, 4:18am

Yes of course,

If the data which been fed into D is no longer a Variable but a plain tensor, then if you do backward() on the output of D, the computation in the network D will not influence the grad of G. So there is no need to do required_grad = False, the detach already done this.

John1231983 · July 19, 2018, 7:04am

Great. So during training G, If i want to update G network only and fed forward data to D network, I still need to set param.requires_grad = False. Am I right?

Note that, this is different with the above case, in which during training G, I also want to get a prediction from D network. The code like

for param in D.parameters():
             param.requires_grad = False
        netG.zero_grad()       
        output = netD(fake)
        output2= netG(fake) #one more here
        errG = criterion(output, label)
        errD = criterion(output2, label)
        errG.backward()
        optimizerG.step()

justusschock · July 19, 2018, 7:10am

No you don’t have to do that. In this example you are using two different optimizers (one per network). They only get parameters of a certain network to optimize. If optimizing G there is no need for setting param.requires_grad = False as long as you don’t call optimizer_D.step()

chenglu · July 19, 2018, 7:30am

I’m confused about your code, why you fed the fake image to both G and D? The fake is the ouput of netG(noise). And, if you are training G, and you also want to get a prediction from D, why not JUST PRINT IT OUT, the prediction of D is needed when training for Gnet. Here is the code from example of G training:

        netG.zero_grad()
        label.fill_(real_label)
        output = netD(fake)
        print(output)        # here , if you want to print it, just print it. 
                             # the whole code just train G because it only 
                             # does optimizerG.step()
        errG = criterion(output, label)
        errG.backward()
        D_G_z2 = output.mean().item()
        optimizerG.step()

John1231983 · July 19, 2018, 2:11pm

I fed data to G to check if the G network is trained or not. I know the normal dcgan just train G and keep D fixed. But I just want to check if during training G, does the network D and G trained nor nor