Model with unused parameters does not work in DDP

Hi, I have a model with conditioned output, for example,

if CONDITION_A:
    x = self.fc1(x)
else:
    x = self.fc2(x)

The training works well on single GPU. However, on multi-gpu with DDP it gives the error

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. Thi
s error indicates that your module has parameters that were not used in producing loss.

I see some people suggest to collect all outputs and multiply the unwanted one with 0 and add it to the loss. I am wondering is there a better way to solve this problem? Thanks.

You can try to change how you construct your DDP module and pass find_unused_parameters=True.