Run RPC over MPI for Parameter Server DRL

I am currently developing an drl framework that can run on a cluster with mpi. i am able to perform synchronous training using DDP over MPI. Now, I want to explore a different structure using a parameter sever and MPI. I saw that RPC would be the right tool, but I cannot figure out how/if rpc can run with mpi.

I saw this example, but it only works when all ranks are running on the same node. Is there a way to accomplish this with pytorch alone or is an additional tool needed?

you do not have to run rpc on MPI, pytorch distributed provides gloo and nccl backends, you can pass ‘gloo’ or ‘nccl’ to init_process_group().

for rpc, to get better performance, you can use tensor pipe as backend option