Training on GPU is very slow

Hi i am traning a massive computer ision model with almost 130M parameteres. But the issue is my training is very slow even on multi-gpu setup. Although this may not be the right comparison, but while using fairseq for Machine Translation, i found the training to be very fast. Is there any reson to that, am i missing any additional packages that i need to install other than Cudann.

Thank you

You would have to profile the workload to understand where the bottleneck in your overall training pipeline is in order to optimizer it further. You could use the native PyTorch profiler or e.g. Nsight Systems as described here.