MPI_Gatherv and MPI_Scatterv ( Feature request/idea )

Andreas_Georgiou · July 17, 2019, 11:16pm

In the documentation of torch.distributed it is not made clear that all tensors in gather() need to be of the same size. After reading the code I noticed that in MPI c++ code the function used is MPI_Gather which imposes that restriction. I was thinking that maybe it would make more sense to use MPI_Gatherv instead so that tensors of variable size can be accepted. I would try to implement it myself but my C++ skills are not that good. If anyone is interested and wants to create a pull request, it would be great!. This is the file I am referring to: https://github.com/pytorch/pytorch/blob/eb76b7a564121c5fede749ad7d0a36f2b61a0a95/torch/lib/c10d/ProcessGroupMPI.cpp#L430

At least if someone can provide a guide on how to make the change for MPI_Gatherv I can make the change for MPI_Scatterv myself.

For MPI_Gatherv we need to provide a list of integers instead of a single integer. This should be easy by calling numel() on each tensor in the gather_list. We also need to provide displacements(see here: https://www.mpich.org/static/docs/v3.1/www3/MPI_Gatherv.html). We could set this to always be the previous displacement plus the last tensor’s numel() and by setting the first displacement to 0. This is if we want to modify the built-in gather. This way it accepts tensors of any size so it covers the case of equal sized tensors. Alternatively, we could create a new torch.distributed.gatherv()

Any help would be appreciated

pietern · July 24, 2019, 9:23am

This is a great idea. I agree having both gather and allgather for variable size is very useful. Also, seeing as the displacement is an implementation detail, the underlying C++ implementation can take care of sharing each process’ contribution, allocating the required memory, and returning the list of tensors.

I created https://github.com/pytorch/pytorch/issues/23299 to track the feature.