Difference between ProcessGroup and Backend classes

Hello,

I am curious about the difference between the ProcessGroup and Backend C++ classes. In the docs on third-party backends, it mentions that the backends should inherit from c10d::ProcessGroup. However, on the page for Customizing ProcessGroup Backends, it says to inherit from the Backend class. I also saw that the PR for a newly created XCCL backend also inherits from the Backend class. From all this, I have a few questions:

  • What is the difference between these classes?
  • How do these classes interact?
  • What is the future plan for these classes? Will ProcessGroup replace the Backend class at any point?

Thank you,
Nathan

1 Like

Hi, long story short. We split backend from ProcessGroup in 2023. So customized backend should inherit Backend class. The ProcessGroup is mostly for Python and BC compatible. When user create a ProcessGroup class in python by calling init_process_group, we will create backend class for different devices (CPU, GPU, e.g.), so when you pass in a tensor on certain device, certain backend will be triggered.