I have two 4x2080ti machines. I want to train my model by NCCL distributed backend. But the training is slow because these two machines are connected by a 1000M ethernet card.
So I want to use two infiniband cards to connect these two machines.
But my GPU is a GeForce not a Tesla. The question is, can infiniband accelerate the training if the GPU don’t support GPUDirect?