find_unused_parameters=True fixes an error

I have a not-that-complex model, but it outputs this error with wrapped with DDP:

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel;

Then, to find out which parm is causing an issue, I turned on find_unused_parameters=True, then the error went away while I was expecting to see a list of parameters. This is pytorch1.8. Can anyone have any clues and explain potentially why? This may not be specific enough, but like to see if anyone has faced the same issue.

find_unused_parameters=True can properly take care of unused parameters and sync them, so it fixes the error. In PT 1.9, if your application has unused parameters and you set find_unused_parameters=False, you will see the error message that includes the indices of the unused parameters

good to know that, thanks!