https://pytorch.org/docs/stable/distributed.html#which-backend-to-use
If you encounter any problem with NCCL, use Gloo as the fallback option
What mechanism is there in PyTorch to implement nccl fallback to gloo
https://pytorch.org/docs/stable/distributed.html#which-backend-to-use
If you encounter any problem with NCCL, use Gloo as the fallback option
What mechanism is there in PyTorch to implement nccl fallback to gloo
You would need to specify it during the distributed initialization.