Is using a single GPU with DDP same as not using DDP?

I am using DDP for multi-GPU training. I was wondering if I have set up the DDP correctly. To validate, I was going to compare the results of multi-GPU and single-GPU training. Reverting to a single GPU code is cumbersome. So I was wondering, is it the same to use a single GPU in the DDP setting as to not using DDP at all?

No, It’s not exactly same.
Empirically, there are some more overheads in a single learning wrapped with DDP.

But computationally, it’s the same.
It means have no effect on convergence

1 Like

Thanks, let me revert back once I have the results. I have started the training.