How to balance GPU memories in DDP?

When training a model with DDP, GPU for rank 0 consumes much higher memory than others.
Because of that GPU, I cannot increase batch-size for training.
Is there a good way to deal it with?

1 Like

Hey did you intentionally create any tensor on cuda:0 from every process? If not, it might be some lib/code accidentally create states on cuda:0. To avoid this, you can set CUDA_VISIBLE_DEVICES env var to make sure that all processes only sees one GPU.

1 Like

Thank for the reply @mrshenli

I haven’t set explicitly device cuda:0 at any point.
And even in the official DDP example code shows the same unbalanced GPU memory consumption.

I solved the issue by setting torch.cuda.set_device(args.local_rank) which works the same as setting CUDA_VISIBLE_DEVICES.