`torch.distributed.init_process_group` hangs with 4 gpus with `backend="NCCL"` but not `"gloo"`

If you do not have root privileges, then modify your command by disabling the peer to peer transport in NCCL backend (as suggested here)

NCCL_P2P_DISABLE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 python file.py