Can I access and call submodules of DDP model when training?

Suppose I have a module combined some submodule, like:

class Comb(nn.Module):
    def __init__(self, encoder, decoder):
        super(Comb, self).__init__()
        self.encoder = encoder
        self.decoder = decoder

When distribute training, I warp the module with DDP:

comb = DDP(comb)
raw_comb = comb.module

Now in training loop, I want to use the submodule of the comb, like:

feature = raw_comb.encoder(input)
out = raw_comb.decoder(feature)
loss = loss_fn(out, label)
loss.backward()

Will this behavior break the DDP module gradient sync?

1 Like

Yes, if you are accessing the internal .module the DDP wrapper won’t be used and you would execute the raw model on each device separately.

Your suggestion saved me 2 hours of time to debug not to mention I don’t know how to check the gradient sync. Thank you.

If you want to dig a bit into the code and see how DDP works check this post as I’ve shared a few references to see how the hooks are used and where the internal module is called.

1 Like