Hi! I’m currently using DDP training. My model looks like:
class MyModel(nn.Module):
def __init__(self):
...
def forward(self, data_dict):
...
return out_dict
def loss_func(self, data_dict):
out_dict = self.forward(data_dict)
loss = compute_loss(data_dict, out_dict) # a function that calculates loss, not important here
Since loss_func
is a class function of MyModel
, I should use loss = model.module.loss_func(xxx)
in the training code. I was wondering if PyTorch still performs DDP sync in this case (e.g. sync the gradient, or if I use all_gather
on some output in out_dict
, can I still gather them)?
I saw a related post here, but it’s about DataParaller not DDP. And I understand that, because in DDP each training process is separated, so I won’t encounter that problem. I just don’t know if sync among process can still be achieved in my case. Thanks in advance!