I want to take the advantage of gradient accumulation to train a GAN with a larger batch size. I understand that for a normal network we just do something like

```
output = net(input)
loss = criterion(output, target_var)
loss = loss / accumulate_steps
loss.backward()
if iterations % accumulate_steps == 0:
optimizer.step()
optimizer.zero_grad()
```

But how can I implement this for a GAN? Cause in training a GAN we need to iteratively update G and D. When calculate the gradient of G, the wrong gradient will be accumulated to D. So we normally clear D’s gradient in each iteration which conflicts with using gradient accumulation strategy.

Here is a code of GAN without accumulate gradient:

```
#-----------
# Update G
#-----------
optimizer_G.zero_grad()
gen_imgs = generator(input_noise)
g_loss = adversarial_loss(discriminator(gen_imgs), label_real)
g_loss.backward()
optimizer_G.step()
#----------
# Update D
#----------
optimizer_D.zero_grad() # This step would clear the wrong gradient
real_loss = adversarial_loss(discriminator(real_imgs), label_real)
fake_loss = adversarial_loss(discriminator(gen_imgs.detach()), label_fake)
d_loss = (real_loss + fake_loss) / 2
d_loss.backward()
```