Get_backend() returns undefined even when NCCL is available

Hi,

I’m currently using torchrun to do distributed training.

I use init_process_group() without any parameters in my code since it can automatically pick the backend based on the device type of my current environment: Distributed communication package - torch.distributed — PyTorch 2.7 documentation

My current environment has GPUs and I can see it being properly used with NCCL based on the NCCL logs generated by the NCCL_DEBUG env var.

Problem: However, torch.distributed.get_backend() still returns ‘undefined’ even when the lazy initialization is done and the use of NCCL is so clear in my code. I also tried to manually put the default process group as its input parameter but it still returns the same. It only returns nccl when I explicitly set init_process_group('nccl').

Question: Is this an expected behavior? How can I detect which backend torch is using if I use init_process_group with no parameter?

I guess this is expected as the backend was undefined during the PG creation and I don’t see where it’s being updated (code). CC @kwen2501 in case the backend is supposed to be updated after the init call selects the default one for the corresponding device.

1 Like

Yes, that’s expected – at init, no backend was tied to the Process Group in your case.
The collective works because PyTorch will try to find a backend for the device identified by your tensor.
The no-parameter init is more for a case where your program need to perform collectives on multiple device types, e.g. “cuda” and “cpu”. Thus, when get_backend() is queried, it does not know one because it is “floating”.

3 Likes

Thanks for the explanation, @kwen2501 and @ptrblck!