Do I need to use a distributed/modified optimizer with DistributedDataParallel

JamesDickens · September 1, 2022, 12:38am

My setup is a single multi-gpu machine.

I am wondering, does the optimizer need to be changed in anyway to account for the
multiple processes?

Also should optimizer.step() only be called by the process at rank 0?

ptrblck · September 1, 2022, 4:51am

No, you don’t need to change the usage of the optimizer and each rank should call the step() function in DDP, not only rank0.