Why does each GPU occupy different memory using DDP?

Shanyan_Guan · September 22, 2020, 11:49am

I found a strange thing. When I use two GPUs, the memory occupied in the two GPUs is the same. But when I use 4 or 6 GPUs, the memory occupied in GPUs is not exactly the same.

Has anyone ever been in this situation?

ptrblck · September 23, 2020, 8:55am

A device might run a bit ahead of the others and could thus create another memory footprint.
Also, if you didn’t set e.g. cudnn to the deterministic mode, the kernel selection might slightly vary between the devices.

Shanyan_Guan · September 23, 2020, 4:43pm

Hi @ptrblck, Thanks for your quick reply! I have set cudnn to the deterministic mode. Maybe this is due to the different speeds between each GPU. Will this case affect the training process?

ptrblck · September 24, 2020, 12:22am

DDP synchronizes the devices if necessary and communicates the gradients etc. between the GPUs.
If one device is a ms faster, it would have to wait, but besides that you shouldn’t see any effects.