Initialising a DDP model twice slows down training

Raider · February 13, 2024, 11:50am

Hi,
When I accidentally wrapped the model twice in the DistributedDataParallel Class, each epoch started taking longer to complete when compared to wrapping it once.

For Example:

model = DistributedDataParallel(model)
# Maybe load some checkpoint
model = DistributedDataParallel(model)

is slower than

model = DistributedDataParallel(model)
# Maybe load some checkpoint

My Setup:
4 Nodes with 1 gpu in each.
I know there is no reason for wrapping a model twice but I was wondering why is it taking longer, in fact, significantly longer, in my case.

Thanks

Raider · February 20, 2024, 8:49am

@ptrblck if you could shed some light on this, it’ll be insightful.
thanks.

ptrblck · February 20, 2024, 4:38pm

I don’t know what would happen if you wrap DDP module again.
Could you create a full (Nsight Systems) profile and compare the timelines between both runs to check if anything stands out?