Hello!
I want to write a distributed program and run it on a cluster with several multi-GPU nodes which is managed using slurm.
The program should have one master process, which sends (equal to MPI_Send / MPI_Recv) different data to other processes and then collect the results (equal to MPI_Gather).
Could you please tell me if my task can be solved using torch.distributed? In the official docs (https://pytorch.org/docs/stable/distributed.html) I found only question marks for send/recv MPI operations for GPU.
I also tried Horovod but found no wrappers around send/recv functions.