Distributed Model Parallel Using Distributed RPC

iffiX · July 16, 2020, 3:23pm

You can use MPI to implement rpc, techincally, but the performance could be really bad, for example, in order to send a message of arbitrary length, in MPI you need to send the size to your target, then the target has to allocate the memory, then you can send the payload, since MPI is not a raw connection like tcp or infiniband, you would expect more delay in these two communications, and you have to deal with process failures! MPI will fail if any component process has failed, and that’s why we would like to remove that behavior in rpc, see 88856.