I want to launch distributed training on two machines. I have two GPU cards on machine A, and one GPU card on machine B. And I’d like to use all these three cards for distributed training.
But when I try to launch the distributed training by specifying machine A with 2 cards and machine B with 1 card, the code cannot run.
PS: I use the “gloo” backend. I wonder if the gloo backend do not support this specifying(different machines with different number of cards), but I am not sure.