Hi, I am training a custom Vision Transformer model on ImageNet. I am using 4 V100 GPUs. Although my GPU utilization is high the power being consumed by the GPU is quite small. Is this normal?
You could check if your GPUs are reducing their clocks as they might otherwise overheat. If so, you might want to check the cooling solution and improve it.
Also, you could profile your use case (e.g. via Nsight Systems) to see if the GPUs might be bottlenecked by other code parts and might be waiting (e.g. in a NCCL call).