How to convert a single-GPU PyTorch script to a multi-GPU multi-node PyTorch script with DDP?

How to convert a single-GPU PyTorch script to a multi-GPU multi-node PyTorch script with DDP?
I read this 10 times already but honestly it’s not really helpful ; what I need is a place that lists the modification needed to convert a single-GPU code to a multi-node multi-GPU code.

Is there a place in the doc that explains how to distribute a PyTorch training script over multiple machines?

2 Likes

Check out these resources. They helped me understand how to do it. I agree that the docs as of now are not to the point.

Hey @Olivier-CR, do you mind opening an issue in the PyTorch repository for this? There is definitely room for improvement in our documentation and this would help us to prioritize it in the near future. Thanks!