Synchronisation after Allreduce

vineeths · January 17, 2021, 4:26am

I know that a torch.cuda.synchronize() waits until all operations on GPUs are completed. Does that also include communication operations?

More specifically, does calling torch.cuda.synchronize() just after an allreduce operation waits till the allreduce operation gets completed?

tensor_reduce_op = torch.distributed.all_reduce(tensor=tensor async_op=True)
torch.cuda.synchronize()