Is there any difference between using DataParallel with one GPU and without using DataParallel? Specifically, will there be any performance difference? Or are they the same?
If there is none, I will choose to always use DataParallel because it allows me to load model easily without needing to consider whether DataParallel is used (whether it is trained using single GPU or multi GPUs).