Various addition to torch.distributed

Hi,

First of all, thanks for the great effort in torch.distributed. I found it very useful for my project.

Is there any plan to support gatherv, scatterv, igather, all_gatherv, etc.?

we plan to add collectives as they are needed in many projects. what is your need for the gatherv and scatterv routines?

I am exploring if PyTorch can be used as a quick way to write portable codes involving distributed matrix multiplications.

For example, consider the case where “world size” is 4 (torch.distributed.get_world_size() returns 4). a 1000 x 10 matrix can be represented by a world with 250x10 matrix on each rank. I needed reduce, all_gather, all_reduce for various types of matrix multiplication involving such “thin and tall” matrices. However, if the number of rows is not divisible by world size, e.g. when the size of data matrix is 1001 x 10, I need all_gatherv in place of all_gather.