Question on generator.zero_grad() in DCGAN

Accumulate gradient over multiple batches on GANs with low memory.

Hi everybody,
I would like to implement gradient updates over multiple minibatches (as described by @albanD in in a GAN.

I would like to call optimizerG.step(), let’s say, only every 4 batches and accumulate gradients for the generator as described in the albanD second example in the answer at the link above.
In DCGAN example we call for every batch the netG.zero_grad() just before the update of the generator and this prevents the gradient accumulation.

Any way I can deal with it?


I am not sure actually how to do that.
Maybe you can do multiple minibatches updating just one network, then a few minibatches updating just the other?

Thank you, had already started in this direction …
I stopped because I wasn’t sure about optimising using data from different batches to first get D(G(fake_z_d)) (and then optimize D) and then optimize G under loss from D(G(fake_z_g)) …

Anyone aware of any “empirical rule” saying one should optimize D and G using z from the same batch?

(I think it shouldn’t matter … LF to post a MWE when and if I get it)