When using torch.distributed Point-to-point communication,is there any way to handle error by myself

Can I implement such a function ,

In send side :
call dist.isend() and do not care about either it is success or failed . and do following jobs.
In recv side :
call dist.irecv() and wait for some seconds, handling errors or timeout by myself without blocking the program.

I tryed to make a simple example :

dist.init_process_group("nccl",world_size=ompi_world_size,rank=ompi_world_rank,
                        init_method='file:///my_path/dist_file',timeout=timeout)
if not dist.is_initialized():
   raise RuntimeError("init fail") 
rank = dist.get_rank()
world_size = dist.get_world_size()
if not torch.cuda.is_available():
    raise RuntimeError("CUDA is not available")
torch.set_default_device('cuda:0')

vec = torch.rand(32)
res = None
success = False

if rank == 2:
    time.sleep(100)
else :
    if rank %2 == 0:
        dist.isend(vec,rank+1)
    else:
        res = dist.irecv(vec,rank-1)
    
    if rank % 2 != 0:
        for l in range(5):
            if res.is_success():
                success = True
            time.sleep(1)

print(f"result: {success}, {vec} in {rank} after send/recv")

dist.destroy_process_group()

asuume that something wrong in rank 2 , rank 3 wait for 5s and give up.
but in result , rank 2 and 3 are both blocking util forever.

Repro: test_ptd_forum_195413.py ยท GitHub

cc. @H-Huang @kwen2501