Will DDP still work if I call model.module?

Dazitu616 · February 25, 2022, 2:08am

Hi! I’m currently using DDP training. My model looks like:

class MyModel(nn.Module):

    def __init__(self):
        ...

    def forward(self, data_dict):
        ...
        return out_dict

    def loss_func(self, data_dict):
        out_dict = self.forward(data_dict)
        loss = compute_loss(data_dict, out_dict)  # a function that calculates loss, not important here

Since loss_func is a class function of MyModel, I should use loss = model.module.loss_func(xxx) in the training code. I was wondering if PyTorch still performs DDP sync in this case (e.g. sync the gradient, or if I use all_gather on some output in out_dict, can I still gather them)?

I saw a related post here, but it’s about DataParaller not DDP. And I understand that, because in DDP each training process is separated, so I won’t encounter that problem. I just don’t know if sync among process can still be achieved in my case. Thanks in advance!

Dazitu616 · February 25, 2022, 6:05am

Ok so I verify it myself. DDP sync always work as long as that value is a CUDA tensor, and it appears in each process. It doesn’t matter if that tensor is e.g. created from a int value or even a numpy array etc.