Today,I trained a model on the GPU.
When I check the GPU, the utilization rate is 20%-60%. And it’s not stable.
Please give me some advise that I can check them one by one.
Thanks a lot.
Today,I trained a model on the GPU.
When I check the GPU, the utilization rate is 20%-60%. And it’s not stable.
Please give me some advise that I can check them one by one.
Thanks a lot.
One reason might be that your mini-batch size is too small. If you increase the batch size the utilization should increase as well.
Mind that varying the mini-batch size can affect generalization and increasing it decreases the number of optimization steps you are doing in an epoch.
Another reason might be a slow preprocessing of the data. In that case, you can increase the data loader number of workers or try to speed up your preprocessing.