Distributed data parallel intentionally didn't work on requires_grad=False. why?

When use DDP on requires=False module,

AssertionError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.

However, I think it is often good to use DDP even if there is no grad.

Why did block this?

I think it’s more reasonable to use with torch.no_grad(): to wrap up your codes that doesn’t require gradient.