I’m trying to use torch.distributed.batch_isend_irecv(). The documentation said that this operation is async. However, I don’t know whether this mean that it means it could do Read after Write violation or Write after Read violation or both.
I’ve observed that if I do.
p2p_op_lst = [
torch.distributed.P2POp(torch.distributed.irecv, x, peer=src, group=self.comm_group)
]
reqs = torch.distributed.batch_isend_irecv(p2p_op_lst)
x = a(x)
The communication could overlap with the operation of kernel a, since communication happens on a different stream than the stream a is on.
But I have never observed the violation of the WAR.
x = a(x)
reqs = torch.distributed.batch_isend_irecv(p2p_op_lst)
i.e. the recv never overlaps with a in this case.
Does torch.distributed’s async guarantee that overlapping in the second case would never happen? Does the user have to do explicit synchronization between communication stream and computation stream? How to do that? req.wait() is only for synchronization after communication but it’s not for synchronization before communication.