Hi,
When I accidentally wrapped the model twice in the DistributedDataParallel Class, each epoch started taking longer to complete when compared to wrapping it once.
For Example:
model = DistributedDataParallel(model)
# Maybe load some checkpoint
model = DistributedDataParallel(model)
is slower than
model = DistributedDataParallel(model)
# Maybe load some checkpoint
My Setup:
4 Nodes with 1 gpu in each.
I know there is no reason for wrapping a model twice but I was wondering why is it taking longer, in fact, significantly longer, in my case.
Thanks