Training G and D at the same time

Hi, I wrote a basic GAN and wonder if it’s possible to train G and D in the same step.
Many codes I’ve seen use alternating steps to train G and D and I find this inefficient.

The idea is to have two optimizers like this:

opt_G = Adam(G.parameters())
opt_D = Adam(D.parameters())
loss_G = log(1-D(G(z)))
loss_D = log(D(G(z))) + D(real)

Gradients from loss_G depend on both G.parameters() and D.parameters() but opt_G will only update G.parameters().

The problem with using one optimizer is loss_G.backward() will put some values on grad of D.parameters(). This is wrong because loss_G affect D’s parameters.

This is what I am thinking:

opt_G.zero_grad()
loss_G.backward()
opt_G.step()

opt_D.zero_grad()
loss_D.backward()
opt_D.step()
  • I am not sure G update doesn’t affect D update in this code.

  • How can I use retain_graph or retain_variables for this? There is no documentation about retain_variables.

  • If loss_G.backward() can only update grad on G.parameters and abandon D.parameters I could write in one optimizer (Is this retain_variables?) :

    opt.zero_grad()
    loss_G.backward(G.parameters())
    loss_D.backward(D.parameters())
    opt.step()

In your code, G and D are coordinating. They are not adversarial.

Thanks, fixed the sign.

After fixing the sign, you should notice that only one backward is not possible :slight_smile:

Right, so I have two loss.backwards() but still one forward.
My question is if changing the order of G/D update gives the same result.

(e.g. opt_G.step() updates G, and loss_D depends on G, so loss_D.backward() calculates the grad based on the new G which is wrong?)

Right. So in reality, you would want to calculate both gradients before updating parameters. It is kinda tricky but you can do, in pseudo code:

fake = G(z)
fake_d = fake.detach()
fake_d.requires_grad = True

pred_fake = D(fake_d)
pred_fake_d = pred_fake.detach()
pred_fake_d.requires_grad = True

loss_G = log(1-pred_fake_d)
grad_pred_fake = autograd.grad(loss_G, pred_fake_d)
grad_fake = autograd.grad(pred_fake, fake_d, grad_pred_fake, retain_graph=True)
fake.backward(grad_fake)

loss_D = log(pred_fake) + D(real)
loss_D.backward()
1 Like

Wow, nice trick. Detaching blocks the backprop.