I am using DDP for multi-GPU training. I was wondering if I have set up the DDP correctly. To validate, I was going to compare the results of multi-GPU and single-GPU training. Reverting to a single GPU code is cumbersome. So I was wondering, is it the same to use a single GPU in the DDP setting as to not using DDP at all?
No, It’s not exactly same.
Empirically, there are some more overheads in a single learning wrapped with DDP.
But computationally, it’s the same.
It means have no effect on convergence
Thanks, let me revert back once I have the results. I have started the training.