m75
December 14, 2017, 7:33am
#1
Hi, I wrote a basic GAN and wonder if it’s possible to train G and D in the same step.
Many codes I’ve seen use alternating steps to train G and D and I find this inefficient.

The idea is to have two optimizers like this:

```
opt_G = Adam(G.parameters())
opt_D = Adam(D.parameters())
loss_G = log(1-D(G(z)))
loss_D = log(D(G(z))) + D(real)
```

Gradients from loss_G depend on both G.parameters() and D.parameters() but opt_G will only update G.parameters().

The problem with using one optimizer is loss_G.backward() will put some values on grad of D.parameters(). This is wrong because loss_G affect D’s parameters.

This is what I am thinking:

```
opt_G.zero_grad()
loss_G.backward()
opt_G.step()
opt_D.zero_grad()
loss_D.backward()
opt_D.step()
```

I am not sure G update doesn’t affect D update in this code.

How can I use retain_graph or retain_variables for this? There is no documentation about retain_variables.

If loss_G.backward() can only update grad on G.parameters and abandon D.parameters I could write in one optimizer (Is this retain_variables?) :

opt.zero_grad()
loss_G.backward(G.parameters())
loss_D.backward(D.parameters())
opt.step()

SimonW
(Simon Wang)
December 14, 2017, 7:35am
#2
In your code, G and D are coordinating. They are not adversarial.

SimonW
(Simon Wang)
December 14, 2017, 8:11am
#4
After fixing the sign, you should notice that only one backward is not possible

m75
December 14, 2017, 2:10pm
#5
Right, so I have two loss.backwards() but still one forward.
My question is if changing the order of G/D update gives the same result.

(e.g. opt_G.step() updates G, and loss_D depends on G, so loss_D.backward() calculates the grad based on the new G which is wrong?)

SimonW
(Simon Wang)
December 19, 2017, 5:39pm
#6
Right. So in reality, you would want to calculate both gradients before updating parameters. It is kinda tricky but you can do, in pseudo code:

```
fake = G(z)
fake_d = fake.detach()
fake_d.requires_grad = True
pred_fake = D(fake_d)
pred_fake_d = pred_fake.detach()
pred_fake_d.requires_grad = True
loss_G = log(1-pred_fake_d)
grad_pred_fake = autograd.grad(loss_G, pred_fake_d)
grad_fake = autograd.grad(pred_fake, fake_d, grad_pred_fake, retain_graph=True)
fake.backward(grad_fake)
loss_D = log(pred_fake) + D(real)
loss_D.backward()
```

1 Like

m75
December 19, 2017, 8:48pm
#7
Wow, nice trick. Detaching blocks the backprop.