How to increase GPU utlization

Thanks a lot for replying.
I tried to profile the code but it kept on running and was not printing anything required. I posted about it here - Torch.utils.bottleneck keeps on running
I tried to increase the num_Workers from 1 to more than 1 but I encountered the error -
Training crashes due to - Insufficient shared memory (shm) - nn.DataParallel
I tried solving it using the below post - but I have enough memory according to this solution Training crashes due to - Insufficient shared memory (shm) - nn.DataParallel

I also tried increasing the batch size but as my GPU’s are already occupied so can’t do it.
I also tried using the distributed data parallel, I posted multiple times here and here - Use Distributed Data Parallel correctly and Is Distributed Data Parallel equivalent to "Defence Against the Dark Arts from Harry Potter" , I am really not figure it out, I also tried horovod from Uber but it is also not working.