Why does each GPU occupy different memory using DDP?

I found a strange thing. When I use two GPUs, the memory occupied in the two GPUs is the same. But when I use 4 or 6 GPUs, the memory occupied in GPUs is not exactly the same.

Has anyone ever been in this situation?

A device might run a bit ahead of the others and could thus create another memory footprint.
Also, if you didn’t set e.g. cudnn to the deterministic mode, the kernel selection might slightly vary between the devices.

1 Like

Hi @ptrblck, Thanks for your quick reply! I have set cudnn to the deterministic mode. Maybe this is due to the different speeds between each GPU. Will this case affect the training process?

DDP synchronizes the devices if necessary and communicates the gradients etc. between the GPUs.
If one device is a ms faster, it would have to wait, but besides that you shouldn’t see any effects.