Disabling all reduce in Distributed Data Parallel

Hello, I’m trying to setup distributed model training. Distributed Data Parallel documentation says that torch.nn.parallel.DistributedDataParallel performs all reduce operation by itself if I got it right. Is it possible to disable this functionality so I can call all reduce manually? Or in this case I must use something instead DistributedDataParallel?

Is it possible to disable this functionality so I can call all reduce manually?

Do you need to implement any customized logic in allreduce? If so, I will recommend DDP comm hooks, which provides an interface to implement customized allreduce.

Another option is no_sync context manager, which will disable allreduce, then it’s your responsibility to run allreduce.

Seems like no_sync is what I need. Thank you