What is the parameter "gradient_as_bucket_view" in DDP


Among the parameters available in DDP, what does “gradient_as_bucket_view” do?
If I use this parameter as True, the memory of the GPU is reduced. What is the reason?

gradient_as_bucket_view enables DDP’s internal implementation to avoid a copy for each parameter gradient, thereby reducing memory.

Oh, thnks.
As an additional question, why need to copy gradients within DDP?