Hi, I have a model with conditioned output, for example,
if CONDITION_A:
x = self.fc1(x)
else:
x = self.fc2(x)
The training works well on single GPU. However, on multi-gpu with DDP it gives the error
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. Thi
s error indicates that your module has parameters that were not used in producing loss.
I see some people suggest to collect all outputs and multiply the unwanted one with 0 and add it to the loss. I am wondering is there a better way to solve this problem? Thanks.