I am currently studying distributed RPC for hybrid parallelism. From the documentation, I figured out RPC supports TensorPipe backend and it is a point-to-point communication. But for hybrid parallelism, I need all-to-all collective communication. Are there any ways for implementing hybrid parallelism with collective communication using distributed RPC?.
I kindly request anyone to provide a solution for this issue.
cc Luca @lcw
I think the main usecase of tensorpipe is not collectives, you can use other solutions, e.g. NCCL, GLOO, UCC.
Luca, are there plans for tensorpipe to be a backend for such collectives?
Yes, correct, we currently don’t provide a way to do collectives on top of RPC/TensorPipe. The rationale is that the “native” collective libraries (NCCL, Gloo, MPI) are already doing a much better job at this, hence we’re not optimizing TensorPipe and RPC for that use case. However you should be able to combine RPC with the collective libraries very easily. Here is a tutorial showing how to do so with DDP, but if you prefer to use the “lower-level” API that should work too: Combining Distributed DataParallel with Distributed RPC Framework — PyTorch Tutorials 1.8.1+cu102 documentation