When does p2p communication uses different NCCL streams?

When using point-to-point communications, it might sometimes be desirable to schedule unrelated comm calls on different streams, e.g. in the backward of Ring attention.
Based on this example(InternEvo/internlm/model/ops/ring_flash_attn/zigzag_ring_flash_attn_with_sliding_window.py at f2949df89c15e1c16b6d48412ae9f94122ef463d · InternLM/InternEvo · GitHub), I profiled to see 3 NCCL streams for p2p comms matching local_dkv_comm, local_kv_comm and dkv_comm

However I couldn’t reproduce them elsewhere with 2 process groups.
Is it because that torch uses different streams only when there are a number of process groups?

IIUC, for each processGroup we assign the job to a NCCL stream. So if you want 3 streams, maybe you want to create a different PG?

That’s right, Closing in favour of the answer on github