Science the device whose index is 0 will hold the network parameters, how to split the training batches differently to maximize the usage of GPUs? In my case, the GPUs except the device[0] has low GPU-Util.
How to fix this problem? Thank you.
Any help is welcome.
Thank you for your advice. The problem is dividing the training samples into equal batches leading to unbalance GPU utility. The default GPU whose index is 0 usually has obvious more memory occupation. So, I think we should split the batches manually to maximize the utilities of multi GPUs.
Not totally. But I warped the loss into the network to alleviate this issue to some extent, look at this link:
Alternatively, I recommend to use distributed training.
1 Like
Thanks for your quick response. Yeah, improving the loss is one way to improve the first GPU memory utilization, however, i am not sure if it helps too much. I would prefer to split the batches manually.