How to increase GPU usage during training

Hello, I just learned using pytorch recently and I have stumbled upon some problem. Somehow when I do training for some image producing model I realized that my GPU utilization is very low (1144/ 11000) around 10% of my GPU capabilities, my PC has GTX 1080Ti. When I asked my friend they said that I need to increase the batch size during data loading, so I increased it from 8 -> 70 and it does make my GPU usage become (10000/11000) but it seems that the training time it took isn’t that different and to my surprise the output result is way worse compared to when I use batch size of 8. So is there any way to speed up my training process without affecting the output result? Thank you

2 Likes
  • In nvidia-smi, Memory-Usage is how much GPU memory does this process use. GPU-Util reports what percentage of time one or more GPU kernel(s) was active for a given time perio. You say it seems that the training time isn’t different. Check GPU-Util.

  • In general, if you use BatchNorm, increasing the batchsize will lead to better results. Since the batchsize is increased by 9 times, if you still use the same hyperparameter configuration, such as num_epochs, you may need to check for overfitting (just a possible conjecture).

Hi, Thank you for your reply. I forgot to mention that I checked using watch -n 1 nvidia-smi and the GPU-util part usually changes between 0->98->30->99 and around that. I also use a lot of BatchNorm on my model. I did finish another training session from scratch and this time I changed the batch size from 70 to 5 and the result is even better than when I used batch size of 8. So I just assume that bigger batch size would result in worse result in my model but is there a way to speed things up just like what I asked on my post?

It seems that your GPUs are waiting for your CPUs. Maybe most of the time is spent loading and preprocessing data.

So what would you suggest to make it faster?

Related to this:

Once I read somewhere said using cudnn can accelerate training by increase GPU usage.

I am not an expert, but theoretically, larger batch size provides more accurate gradient estimation.