For example, in the Python API, there are interfaces like torch.distributed.allreduce, but I couldn’t find similar interfaces in the LibTorch C++ API.
Thank you for all the responses.
For example, in the Python API, there are interfaces like torch.distributed.allreduce, but I couldn’t find similar interfaces in the LibTorch C++ API.
Thank you for all the responses.
In cpp, there are APIs inside ProcessGroupNCCL which directly calls into nccl API, is this something you are looking for?
Yes, I saw it. Thank you very much for your reply.
Hope to consult another question: How to assign a specific stream for ProcessGroupNCCL and subsequent communication operations?