Do I need to use a distributed/modified optimizer with DistributedDataParallel

I went through this tutorial here:
https://pytorch.org/tutorials/intermediate/ddp_tutorial.html

My setup is a single multi-gpu machine.

I am wondering, does the optimizer need to be changed in anyway to account for the
multiple processes?

Also should optimizer.step() only be called by the process at rank 0?

No, you don’t need to change the usage of the optimizer and each rank should call the step() function in DDP, not only rank0.

1 Like