When use DDP on requires=False module,
AssertionError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.
However, I think it is often good to use DDP even if there is no grad.
Why did block this?