Libtorch C++ MPI example

Hello,
Do you guys have a C++ example similar to the python sample here:
https://pytorch.org/tutorials/intermediate/dist_tuto.html
From looking at the source code, I only see python support in source code; for example here …\torch\csrc\multiprocessing\init.cpp

Thank you.

Do you mean an example of distributed training using the C++ frontend? We don’t have one combining the two unfortunately. Also, there is not yet a torch.nn.parallel.DistributedDataParallel equivalent for the C++ frontend. That said, it is possible to use the distributed primitives from C++. See torch/lib/c10d for the source code.

Yep, that’s what I meant. I will take a look at torch/lib/c10d and try to build one myself.
Thanks for the reply.

@kais I am looking for an example for distributed training with C++ frontend. If you managed to build one, can you please share?

1 Like

I managed to implement a few examples using Libtorch and MPI to help others in the community. Check https://github.com/soumyadipghosh/eventgrad

2 Likes

@soumyadipghosh Thanks for contributing this to the community and for the C++/MPI example PR!

Just as a general note for this thread, using the c10d APIs will enable distributed data parallel training that will produce the same results as DDP. However, calling allreduce after the backward pass to synchronize gradients will likely lag in performance as compared to DDP, which overlaps computation and communication by synchronizing smaller buckets of gradients during the backward pass.

1 Like