High GPU Memory-Usage but low volatile gpu-util

Leiguang_Hao · June 7, 2017, 1:47pm

Hi,

I am running a model implemented by pytorch with four GPU, the GPU usage is up to 80% while the volatile GPU-Util is very low.

When debug, all variable at GPUs, so I wonder if anyone could tell me what element in the code could possibly cause this problem?

Any help are more than welcome.

vinaychandranp · June 9, 2017, 7:17am

Your system is probably having trouble supplying data fast enough to saturate the compute on all 4 GPUs. How many workers are you using for the dataloader?

bkj · June 23, 2017, 6:06pm

I’m having a similar issue – using 8 worker in torch.utils.data.DataLoader, and Volatile GPU-Util is jumping between 0% and 80% ~ every half second. Anything I can do to improve this? Can send code if it’d be helpful.

Thanks
Ben

hailsham · November 6, 2017, 8:12am

I also have the similar problem. the worse thing I met was not only the volatile gpu-util is low, but also the Memory-Usage is low.

Zhujunnan · November 27, 2017, 3:12am

I also have the same problem. Is the calculation inefficient?

chenyuntc · November 27, 2017, 4:41am

I guess your data-processing isn’t fast enough for computation.

jdhao · December 6, 2017, 3:56am

Try to increase the number of worker in your Dataloader, it may increase the GPU utility.

JiangPQ · December 17, 2017, 8:55am

I met the same problem with only one GPU(NVIDIA GTX 1080 Ti) while training on ImageNet dataset with ImageFolder API.

In my case, the dataset images are on my Seagate 2TB HDD so the problem may partly caused by the slow reading speed of HDD. Probably putting the dataset on a SSD may help a lot.

However I found it got better when setting shuffle=False and num_workers larger but the problem still happened sometime. I think the shuffle procedure in Torch may also be the bottleneck.

lkywk · December 18, 2017, 2:22am

You can use a new dataloader. Or store the preprocessing datasets. Just directly use the preprocessing datasets. In ImageNet and VGG16, it works.