I have a not-that-complex model, but it outputs this error with wrapped with DDP:
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument find_unused_parameters=True
to torch.nn.parallel.DistributedDataParallel
;
Then, to find out which parm is causing an issue, I turned on find_unused_parameters=True, then the error went away while I was expecting to see a list of parameters. This is pytorch1.8. Can anyone have any clues and explain potentially why? This may not be specific enough, but like to see if anyone has faced the same issue.