How to perform distributed communication (NCCL) using LibTorch?

For example, in the Python API, there are interfaces like torch.distributed.allreduce, but I couldn’t find similar interfaces in the LibTorch C++ API.

Thank you for all the responses.

In cpp, there are APIs inside ProcessGroupNCCL which directly calls into nccl API, is this something you are looking for?

Yes, I saw it. Thank you very much for your reply.

Hope to consult another question: How to assign a specific stream for ProcessGroupNCCL and subsequent communication operations?