Can I access and call submodules of DDP model when training?

Suppose I have a module combined some submodule, like:

class Comb(nn.Module):
    def __init__(self, encoder, decoder):
        super(Comb, self).__init__()
        self.encoder = encoder
        self.decoder = decoder

When distribute training, I warp the module with DDP:

comb = DDP(comb)
raw_comb = comb.module

Now in training loop, I want to use the submodule of the comb, like:

feature = raw_comb.encoder(input)
out = raw_comb.decoder(feature)
loss = loss_fn(out, label)
loss.backward()

Will this behavior break the DDP module gradient sync?

Yes, if you are accessing the internal .module the DDP wrapper won’t be used and you would execute the raw model on each device separately.

Your suggestion saved me 2 hours of time to debug not to mention I don’t know how to check the gradient sync. Thank you.

If you want to dig a bit into the code and see how DDP works check this post as I’ve shared a few references to see how the hooks are used and where the internal module is called.

1 Like