Hi, I have a model with conditioned output, for example,
if CONDITION_A: x = self.fc1(x) else: x = self.fc2(x)
The training works well on single GPU. However, on multi-gpu with DDP it gives the error
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. Thi s error indicates that your module has parameters that were not used in producing loss.
I see some people suggest to collect all outputs and multiply the unwanted one with 0 and add it to the loss. I am wondering is there a better way to solve this problem? Thanks.