Distributed data parallel intentionally didn't work on requires_grad=False. why?

MakeDirtyCode · May 6, 2021, 9:15pm

When use DDP on requires=False module,

AssertionError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.

However, I think it is often good to use DDP even if there is no grad.

Why did block this?

huahuanZ · May 7, 2021, 11:36am

I think it’s more reasonable to use with torch.no_grad(): to wrap up your codes that doesn’t require gradient.