Debugging DataParallel, no speedup and uneven memory allocation

DataParallel also distributes backward pass, it is hidden in autograd. DataParallel has to broadcast and reduce all the parameters, so parallelization efficiency decreases when you computation time is small and you have a lot of parameters.

1 Like