Gather: gather_list does not contain results from gpu 0

thorstenwagner · July 20, 2023, 9:58am

I’ve a setup using two GPUs with DistributedDataParallel. I’m using the latest nightly build.

I try to gather the results, but I always endup with 2x the results of gpu 1:

items = torch.from_numpy(items[:10]).to(self.rank) #necessary because of nvcc, : 10 to make it readable
print(f"Rank {self.rank} items:")
print(items)
if self.rank == 0:
    items_gather_list = [torch.zeros_like(items)] * 2
tdist.barrier()
print("GATHER")
tdist.gather(items,
                    gather_list=items_gather_list if self.rank == 0 else None,
                    dst=0)
if self.rank == 0:
    print(f"LEN GATER LIST", len(items_gather_list))
    print(items_gather_list[0])
    print(" ")
    print(items_gather_list[1])

That is the output:

Rank 1 items:                                                                                                                                                                                                                               
Rank 0 items:
tensor([   1, 1073, 1589, 1453, 1997, 1075,  115, 1041, 1133,  756],
       device='cuda:1')
tensor([  44, 1723, 1039, 1321, 1065,  281, 1364, 1830,  296, 1228],
       device='cuda:0')
GATHER
GATHER
LEN GATER LIST 0 2
tensor([   1, 1073, 1589, 1453, 1997, 1075,  115, 1041, 1133,  756],
       device='cuda:0')
 
tensor([   1, 1073, 1589, 1453, 1997, 1075,  115, 1041, 1133,  756],
       device='cuda:0')

What am I doing wrong?

kumpera · July 21, 2023, 2:24pm

The problem is that you’re passing a list of tensors where all elements are the same tensor.

items_gather_list = [torch.zeros_like(items)] * 2

This doesn’t deep copy the tensor, which is what you want.

This should fix it for you:

items_gather_list = [torch.zeros_like(items), torch.zeros_like(items)]

thorstenwagner · July 31, 2023, 12:48pm

Thanks you! It works