Hi, I’m wondering what is the main difference between torch.cuda.synchronize() and dist.barrier().
I know that the former prevents CPU thread from proceeding until the previous works are done,
and the latter makes processes wait until every process reaches dist.barrier().
So I think those two have the same purpose… what’s the difference?
If I was wrong, please feel free to convince me that I was wrong.
torch.cuda.synchronize() synchronizes the current device and waits until all GPU work is finished thus blocking the host from advancing. dist.barrier is used in a distributed setup and synchronizes all processes until the group enters this function. Even if all GPU work is already done in one process it would still wait for all other processes until they reach the barrier before advancing.