Libtorch C++ MPI example

kais · March 12, 2019, 7:49pm

Hello,
Do you guys have a C++ example similar to the python sample here:
https://pytorch.org/tutorials/intermediate/dist_tuto.html
From looking at the source code, I only see python support in source code; for example here …\torch\csrc\multiprocessing\init.cpp

Thank you.

pietern · March 13, 2019, 4:40pm

Do you mean an example of distributed training using the C++ frontend? We don’t have one combining the two unfortunately. Also, there is not yet a torch.nn.parallel.DistributedDataParallel equivalent for the C++ frontend. That said, it is possible to use the distributed primitives from C++. See torch/lib/c10d for the source code.

kais · March 13, 2019, 8:40pm

Yep, that’s what I meant. I will take a look at torch/lib/c10d and try to build one myself.
Thanks for the reply.

soumyadipghosh · October 10, 2019, 3:44am

@kais I am looking for an example for distributed training with C++ frontend. If you managed to build one, can you please share?

soumyadipghosh · November 28, 2020, 6:39am

I managed to implement a few examples using Libtorch and MPI to help others in the community. Check https://github.com/soumyadipghosh/eventgrad

osalpekar · November 29, 2020, 12:44am

@soumyadipghosh Thanks for contributing this to the community and for the C++/MPI example PR!

Just as a general note for this thread, using the c10d APIs will enable distributed data parallel training that will produce the same results as DDP. However, calling allreduce after the backward pass to synchronize gradients will likely lag in performance as compared to DDP, which overlaps computation and communication by synchronizing smaller buckets of gradients during the backward pass.