Does DDP supports dual-socket machine?

I know DDP can easily support training on multiple GPUs. And I am wondering if something similar can be done on a platform with two CPUs. In particular, can each CPU have a model replica, run the training process on its own, and then synchronize the gradients? Thanks.

Theoratically DDP probably supports two CPUs as long as you initialize gloo process group, but I think it’s better to train on GPUs for the best support

should work just fine, take a look at torchrun docs for examples on how to do this torchrun (Elastic Launch) — PyTorch 2.0 documentation