Yes, thanks, I thought about it as well, this is actually a method used by WGAN-GP PyTorch implementation, but as I said my case is even more complex For example, what if you want to actually also optimize D but using a different loss term? You would need another optimizer to do that or to manually pass the gradients. Using
no_backward_accumulation you could optimize G and D very easily jointly:
fakes = G(x)
scores = D(fakes)
generator_loss = some_criterion(scores)
discriminator_loss = another_criterion(D(fakes.detach(), reals))
loss = generator_loss + discriminator_loss
Will try to find a solution to this problem and if you think that such a feature is useful I can submit a pull request.