GLOO and infiniband

seliad · July 15, 2020, 10:40am

Hi,
Can GLOO work with infiniband?

Our RTX2080ti GPUs do not support GPUDirect/RDMA anyway, so the only thing we want is to work out of the box in reasonable BW which will not become the bottleneck, we are doing P2P communication.

edit: well I see that here https://github.com/facebookincubator/gloo they say its supported, but I wonder if you still have anything further to say on integration with Pytorch.

edit2: especially since here https://pytorch.org/docs/stable/distributed.html its written than GLOO does not support infiniband.

jiayisuse · July 15, 2020, 5:22pm

Hi,

GLOO does have an ibverbs transport https://github.com/facebookincubator/gloo/tree/master/gloo/transport/ibverbs. However, it was never used or tested with PyTorch. That may be the reason that PyTorch doc says GLOO does not support infiniband.

We are about to test GLOO ibverbs transport over RDMA, and integrate with PyTorch on HPC scenarios. For now, GLOO ibverbs hasn’t been integrated to PyTorch yet.

jiayisuse · July 15, 2020, 5:45pm

Filed an issue https://github.com/pytorch/pytorch/issues/41485