Can ranks use different batch patterns when calling P2P communication?

Two ranks do send and recv to exchange data.
Rank 0 groups them together, while rank 1 calls them separately.
Is the behaviour undefined?

send_tensor = torch.arange(2, dtype=torch.float32, device='cuda') + 2 * rank
recv_tensor = torch.randn(2, dtype=torch.float32, device='cuda')

if rank == 0:
    send_op = dist.P2POp(dist.isend, send_tensor, 1)
    recv_op = dist.P2POp(dist.irecv, recv_tensor, 1)
    reqs = dist.batch_isend_irecv([send_op, recv_op])
    for req in reqs:
        req.wait()
else:
    send_op = dist.P2POp(dist.isend, send_tensor, 0)
    recv_op = dist.P2POp(dist.irecv, recv_tensor, 0)
    reqs = dist.batch_isend_irecv([send_op])
    reqs += dist.batch_isend_irecv([recv_op])
    for req in reqs:
        req.wait()

I got an ncclInternalError. Is this defined by NCCL?

Traceback (most recent call last):
  File "test_comm.py", line 30, in <module>
    reqs = dist.batch_isend_irecv([send_op])
  File "/usr/local/lib/python3.8/dist-packages/torch/distributed/distributed_c10d.py", line 1865, in batch_isend_irecv
    p2p_op.op(p2p_op.tensor, p2p_op.peer, p2p_op.group, p2p_op.tag)
  File "/usr/lib/python3.8/contextlib.py", line 120, in __exit__
    next(self.gen)
  File "/usr/local/lib/python3.8/dist-packages/torch/distributed/distributed_c10d.py", line 1810, in _coalescing_manager
    work = group._end_coalescing(device)
torch.distributed.DistBackendError: NCCL error in: /root/pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:3608, internal error - please report this issue to the NCCL developers, NCCL version 2.20.5
ncclInternalError: Internal check failed.
Last error:
Message truncated : received 4096 bytes instead of 2048