Loss modules and DistributedDataParallel

Why are loss modules (criterion) often not wrapped by DDP?

Is it because loss modules typically have no learned parameters?

(and thus if they have, they should be part of DDP?)

The loss calculation is performed of each node and doesn’t need any communication. In case your loss needs to communicate some gradients, I guess it could be used as part of the model (and would thus be treated as any nn.Module).