Two DDP models on the same GPU

Hi all!

I was wondering if I can initialize two DDP models on the same device within the same process group or if this might cause any bugs i.e. regarding the memory allocation.

I would avoid that since it is an unsupported setting. Any reason for not using two separate process groups?

Because I am using the NCCL backends and here Pytorch Doc it says:

“Using multiple process groups with the NCCL backend concurrently is not safe and the user should perform explicit synchronization in their application to ensure only one process group is used at a time. This means collectives from one process group should have completed execution on the device (not just enqueued since CUDA execution is async) before collectives from another process group are enqueued. See Using multiple NCCL communicators concurrently for more details.”

and I am a bit unsure how to introduce explicit synchronisation.

I am also trying to implement a GAN, so both process groups should communicate with each other. That’s why I would required subprocess groups, I guess.