One vs two optimizers for two generators

What is the difference (theoretically) between using 2 different optimizers for two networks (for example 2 generators/discriminators) and using 1 in which we sum the losses of both networks and bp it?
For example in CycleGAN, there are two optimizers for the two Ds and one for the two Gs.

there’s no difference for all 1st order optimizers, it’s your choice. For 2nd order optimizers like optim.LBFGS it matters.

but isn’t doing (lossG1 + lossG2).backward() means that both generators are affected by the same summed loss in contrast to having 2 optimizer and doing lossG1.backward() for one and lossG2.backward() for the second?