I am running a model implemented by pytorch with four GPU, the GPU usage is up to 80% while the volatile GPU-Util is very low.
When debug, all variable at GPUs, so I wonder if anyone could tell me what element in the code could possibly cause this problem?
Any help are more than welcome.
Your system is probably having trouble supplying data fast enough to saturate the compute on all 4 GPUs. How many workers are you using for the dataloader?
I’m having a similar issue – using 8 worker in
Volatile GPU-Util is jumping between 0% and 80% ~ every half second. Anything I can do to improve this? Can send code if it’d be helpful.
I also have the similar problem. the worse thing I met was not only the volatile gpu-util is low, but also the Memory-Usage is low.
I also have the same problem. Is the calculation inefficient?
I guess your data-processing isn’t fast enough for computation.
Try to increase the number of worker in your
Dataloader, it may increase the GPU utility.
I met the same problem with only one GPU(NVIDIA GTX 1080 Ti) while training on ImageNet dataset with
In my case, the dataset images are on my Seagate 2TB HDD so the problem may partly caused by the slow reading speed of HDD. Probably putting the dataset on a SSD may help a lot.
However I found it got better when setting
num_workers larger but the problem still happened sometime. I think the shuffle procedure in Torch may also be the bottleneck.
You can use a new dataloader. Or store the preprocessing datasets. Just directly use the preprocessing datasets. In ImageNet and VGG16, it works.