What is the parameter "gradient_as_bucket_view" in DDP

coaksidenc · August 15, 2022, 6:38am

Hello

Among the parameters available in DDP, what does “gradient_as_bucket_view” do?
If I use this parameter as True, the memory of the GPU is reduced. What is the reason?

rvarm1 · August 16, 2022, 2:13am

gradient_as_bucket_view enables DDP’s internal implementation to avoid a copy for each parameter gradient, thereby reducing memory.

coaksidenc · August 16, 2022, 7:29am

Oh, thnks.
As an additional question, why need to copy gradients within DDP?