Is DDP faster than DP in inference time?


I am wondering is DDP faster than DP in inference time?
DDP is normally faster than DP in training because of the way it handles the gradient, but how about inference where no gradient is needed?

Hey @Giang_Nguyen, for inference, you don’t need DDP or DP, because their main feature is synchronizing gradients which only applies to training.

If you have to use DDP or DP for inference, then yes, DDP will be faster. This is because DP will broadcast model weights to all devices in the beginning of every iteration, while DDP maintains a separate model replica on each process.