Should I use Gloo, MPI, or NCCL2 for distributed training in multiple EC2 instances?

Hi there,

I wonder which backend is more appropriate when you have multiple machines with multiple GPUs each using a maximum bandwidth of 25 Gbps?

Thanks!

Marcio