I am trying to run my deep learning model on single machine 3 GPUs. But my GPU utilization is very low -
I can’t increase the batch size because then I am exceeding the memory available in GPU.
Below are my GPU memory graph -
How to increase the GPU utilization?
You would have to profile the code and isolate the potential bottleneck.
Since the GPU utilization is low, your data loading might be the bottleneck.
This post explains potential workarounds and best practices for this issue.
Thanks a lot for replying.
I tried to profile the code but it kept on running and was not printing anything required. I posted about it here - Torch.utils.bottleneck keeps on running
I tried to increase the num_Workers from 1 to more than 1 but I encountered the error -
Training crashes due to - Insufficient shared memory (shm) - nn.DataParallel
I tried solving it using the below post - but I have enough memory according to this solution Training crashes due to - Insufficient shared memory (shm) - nn.DataParallel
I also tried increasing the
batch size but as my GPU’s are already occupied so can’t do it.
I also tried using the
distributed data parallel, I posted multiple times here and here - Use Distributed Data Parallel correctly and Is Distributed Data Parallel equivalent to "Defence Against the Dark Arts from Harry Potter" , I am really not figure it out, I also tried horovod from Uber but it is also not working.
torch.utils.bottleneck might be rerunning to profile the CUDA kernels. How many times is the script being rerun?
To narrow down a potential data loading bottleneck, you could also use this
AverageMeter object as used in the ImageNet example.