How does DistributedDataParallel handle parameters whose requires_grad flag is False?

palo_bajo · May 27, 2022, 12:09am

I found this suggestion quite useful. I am facing exactly the same problem described here but I can’t fix it by just destroying the DDP and creating a new instance Basically I have a model wrapped around DDP, and after 10 epochs I want to freeze some parts using requires_grad = False and keep training the rest of the model.

To destroy the DDP model instance I am following Safely removing a Module from DDP, this is, model = model.module, but still not working.

Any suggestions?