If it is indeed the case that GANs need separate process groups for G and D, then that is something that definitely needs to be in the docs. I’ve had some strange training results while using DDP and this may be the cause.
@pietern Do you know if the interference you speak of would cause an exception, or just produce incorrect gradients?
I want to put together a test bed for this and see if there are indeed different gradients when using 1 vs 2 PGs.