Calling DistributedDataParallel on multiple Modules?

If it is indeed the case that GANs need separate process groups for G and D, then that is something that definitely needs to be in the docs. I’ve had some strange training results while using DDP and this may be the cause.

@pietern Do you know if the interference you speak of would cause an exception, or just produce incorrect gradients?

I want to put together a test bed for this and see if there are indeed different gradients when using 1 vs 2 PGs.