Very low GPU power usage

Hi, I am training a custom Vision Transformer model on ImageNet. I am using 4 V100 GPUs. Although my GPU utilization is high the power being consumed by the GPU is quite small. Is this normal?

Initially the training is quite fast, but with time the GPU power usage falls and the training becomes slow.

Is this a problem with my hardware setup? Thank you for your kind help.

You could check if your GPUs are reducing their clocks as they might otherwise overheat. If so, you might want to check the cooling solution and improve it.
Also, you could profile your use case (e.g. via Nsight Systems) to see if the GPUs might be bottlenecked by other code parts and might be waiting (e.g. in a NCCL call).

1 Like