Question on generator.zero_grad() in DCGAN

TL;DR
Accumulate gradient over multiple batches on GANs with low memory.

Hi everybody,
I would like to implement gradient updates over multiple minibatches (as described by @albanD in https://discuss.pytorch.org/t/why-do-we-need-to-set-the-gradients-manually-to-zero-in-pytorch/4903/18) in a GAN.

I would like to call optimizerG.step(), let’s say, only every 4 batches and accumulate gradients for the generator as described in the albanD second example in the answer at the link above.
In DCGAN example https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html#training we call for every batch the netG.zero_grad() just before the update of the generator and this prevents the gradient accumulation.

Any way I can deal with it?

Hi,

I am not sure actually how to do that.
Maybe you can do multiple minibatches updating just one network, then a few minibatches updating just the other?

Thank you, had already started in this direction …
I stopped because I wasn’t sure about optimising using data from different batches to first get D(G(fake_z_d)) (and then optimize D) and then optimize G under loss from D(G(fake_z_g)) …

Anyone aware of any “empirical rule” saying one should optimize D and G using z from the same batch?

(I think it shouldn’t matter … LF to post a MWE when and if I get it)