I want to do layerdrop. Unfortunately when I do so I get an error:
Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. [...]
I thought this could be caused by the fact that I was doing DDP and different workers were dropping different layers leading to problems when trying to sync gradients. However I set the seed so the different workers should be dropping the same layer at the same time and I still get the error.
Why is this happening?