How to deallocate the DDP gradient buckets?

Di_Wu1 · May 25, 2022, 12:53pm

How can I deallocate the DDP gradient buckets to reduce memory footprint?

In my model, I want to use customized buckets for gradient bucketing instead of DDP gradient buckets. Can I deallocate DDP gradient buckets that are roughly the same size as the model parameters?

Thanks.

wanchaol · June 7, 2022, 10:34pm

Curious how you do the custom bucketing, maybe you are using ddp comm hook to implement a custom bucketing? One suggestion maybe using views of gradient bucket, instead of doing clones, this might help saving some memory.