How can i gather tensors from all gpus in one machine(Windows with gloo backend))?

lirx02 · September 16, 2021, 9:41am

I use only one machine with multigpu to train. In init_process_group(), i set world_size==1 and rank==0. I am not very sure that whether it is right for multigpu in one node. It seems that the gpu usage is fine(about 100% in 2 gpu). But when i want to gather the same tensor in different gpus, dist.gather and dist.all_gather doesn’t work, the error like below when running dist.gather:
ValueError: ProcessGroupGloo::gather: Incorrect output list size 2. Output list size should be 1, same as size of the process group.

The environment is Windows, torch 1.7.1, gloo backend, cuda10.2.

rvarm1 · September 20, 2021, 5:06pm

Hi, if you’re aiming to do distributed training across 2 GPUs you’d want to set world_size to 2 and the corresponding ranks would be 0 and. 1.

The GPU utilization is at 100% across both GPUs possibly because both GPUs are operating independently with no distributed coordination between them.

Your call to all_gather is the right approach and the error message confirms that world_size is set incorrectly. Fixing the world_size issue should unblock your use case.