How to increase GPU usage during training

luminoussin · January 12, 2020, 2:06am

Hello, I just learned using pytorch recently and I have stumbled upon some problem. Somehow when I do training for some image producing model I realized that my GPU utilization is very low (1144/ 11000) around 10% of my GPU capabilities, my PC has GTX 1080Ti. When I asked my friend they said that I need to increase the batch size during data loading, so I increased it from 8 -> 70 and it does make my GPU usage become (10000/11000) but it seems that the training time it took isn’t that different and to my surprise the output result is way worse compared to when I use batch size of 8. So is there any way to speed up my training process without affecting the output result? Thank you

Eta_C · January 13, 2020, 2:38am

In nvidia-smi, Memory-Usage is how much GPU memory does this process use. GPU-Util reports what percentage of time one or more GPU kernel(s) was active for a given time perio. You say it seems that the training time isn’t different. Check GPU-Util.
In general, if you use BatchNorm, increasing the batchsize will lead to better results. Since the batchsize is increased by 9 times, if you still use the same hyperparameter configuration, such as num_epochs, you may need to check for overfitting (just a possible conjecture).

luminoussin · January 13, 2020, 4:02am

Hi, Thank you for your reply. I forgot to mention that I checked using watch -n 1 nvidia-smi and the GPU-util part usually changes between 0->98->30->99 and around that. I also use a lot of BatchNorm on my model. I did finish another training session from scratch and this time I changed the batch size from 70 to 5 and the result is even better than when I used batch size of 8. So I just assume that bigger batch size would result in worse result in my model but is there a way to speed things up just like what I asked on my post?

Eta_C · January 13, 2020, 4:12am

It seems that your GPUs are waiting for your CPUs. Maybe most of the time is spent loading and preprocessing data.

luminoussin · January 13, 2020, 9:27am

So what would you suggest to make it faster?

Isaac_Kargar · January 29, 2020, 5:37am

Related to this:

kaiseryet · January 29, 2020, 6:22pm

Once I read somewhere said using cudnn can accelerate training by increase GPU usage.

chen_shao · December 10, 2022, 12:07pm

I am not an expert, but theoretically, larger batch size provides more accurate gradient estimation.