Hello,
Do you guys have a C++ example similar to the python sample here: https://pytorch.org/tutorials/intermediate/dist_tuto.html
From looking at the source code, I only see python support in source code; for example here …\torch\csrc\multiprocessing\init.cpp
Do you mean an example of distributed training using the C++ frontend? We don’t have one combining the two unfortunately. Also, there is not yet a torch.nn.parallel.DistributedDataParallel equivalent for the C++ frontend. That said, it is possible to use the distributed primitives from C++. See torch/lib/c10d for the source code.
@soumyadipghosh Thanks for contributing this to the community and for the C++/MPI example PR!
Just as a general note for this thread, using the c10d APIs will enable distributed data parallel training that will produce the same results as DDP. However, calling allreduce after the backward pass to synchronize gradients will likely lag in performance as compared to DDP, which overlaps computation and communication by synchronizing smaller buckets of gradients during the backward pass.