Loss modules and DistributedDataParallel

vadimkantorov · December 17, 2021, 11:28am

Why are loss modules (criterion) often not wrapped by DDP?

Is it because loss modules typically have no learned parameters?

(and thus if they have, they should be part of DDP?)

ptrblck · December 19, 2021, 9:02pm

The loss calculation is performed of each node and doesn’t need any communication. In case your loss needs to communicate some gradients, I guess it could be used as part of the model (and would thus be treated as any nn.Module).