find_unused_parameters=True fixes an error

thyeros · September 2, 2021, 8:15pm

I have a not-that-complex model, but it outputs this error with wrapped with DDP:

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel;

Then, to find out which parm is causing an issue, I turned on find_unused_parameters=True, then the error went away while I was expecting to see a list of parameters. This is pytorch1.8. Can anyone have any clues and explain potentially why? This may not be specific enough, but like to see if anyone has faced the same issue.

Yanli_Zhao · September 7, 2021, 4:54pm

find_unused_parameters=True can properly take care of unused parameters and sync them, so it fixes the error. In PT 1.9, if your application has unused parameters and you set find_unused_parameters=False, you will see the error message that includes the indices of the unused parameters

thyeros · September 9, 2021, 3:43am

good to know that, thanks!

Hosein · November 19, 2024, 6:57am

find_unused_parameters=True can properly take care of unused parameters and sync them…

What does this mean to sync a parameter which is not used?! If a parameter is not used, and then we set find_unused_parameters=True, does this mean that the parameter starts to be used and contributes to the loss?