Nvidia A100 slow training speed

I moved my model from Quadro RTX 6000 to A100 GPU for the training. But I couldn’t see any speedup in the training process. I am training the model in mixed precision on both machines. I can say training on RTX is marginally faster than A100. What could be the reason as A100 are claimed to be the have lesser training time. What could be the reason for this?

It likely depends on if your model has enough computation to fully utilize the GPU or if there is a bottleneck somewhere else in the system. If for example the model or the batch size is too small to fully utilize the GPU, then you can see cases where performance really depends on factors like the peak clock rate attained by each setup. What is the utilization reported by nvidia-smi?

I am using a batch size of 1 as my input to the model is 128x128x128. The gpu is consuming around 7.5GB out of 40GB.

Could you check the utilization reported by nvidia-smi? You might want to increase the batch size until you are close to using all available memory or the throughput in examples/sec is no longer improving.


did you get the A100 to speed up?