Different batch size on different gpus in ddp

I am training a classification model using 4 gpus. I see there are 3 extra processes running on gpu 0 so it wont fit a batch size larger than 2 but the others can still accept a batch size of 4. I dont understand what the 3 extra processes are and is manually setting batch size for each gpu correct?

batch_size=4
if gpu == 0:
    batch_size = 2

It seems you might have created multiple CUDA contexts on the default device (GPU0). Are you launching the script via torchrun? If so, make sure that each script sets the proper device e.g. via torch.cuda.set_device.

1 Like

+1 to @ptrblck’s answer

You can also set CUDA_VISIBLE_DEVICES to make sure that each process only sees one device, so that they won’t unintentionally create CUDA context on cuda:0

I am using mp.spawn as suggested in other thread for single machine multi-gpu setup.