How can I deallocate the DDP gradient buckets to reduce memory footprint?
In my model, I want to use customized buckets for gradient bucketing instead of DDP gradient buckets. Can I deallocate DDP gradient buckets that are roughly the same size as the model parameters?
Curious how you do the custom bucketing, maybe you are using ddp comm hook to implement a custom bucketing? One suggestion maybe using views of gradient bucket, instead of doing clones, this might help saving some memory.