How to split the training batch into different sizes before feeding into multi GPUs?

Science the device whose index is 0 will hold the network parameters, how to split the training batches differently to maximize the usage of GPUs? In my case, the GPUs except the device[0] has low GPU-Util.
How to fix this problem? Thank you.

Any help is welcome.

Hi,
Maybe you can use ‘nn.DataParallel’? https://pytorch.org/docs/stable/nn.html#dataparallel

Thank you for your advice. The problem is dividing the training samples into equal batches leading to unbalance GPU utility. The default GPU whose index is 0 usually has obvious more memory occupation. So, I think we should split the batches manually to maximize the utilities of multi GPUs.

Hi @jia_lee Have you solved this problem? Thanks.

Not totally. But I warped the loss into the network to alleviate this issue to some extent, look at this link:


Alternatively, I recommend to use distributed training.

1 Like

Thanks for your quick response. Yeah, improving the loss is one way to improve the first GPU memory utilization, however, i am not sure if it helps too much. I would prefer to split the batches manually.:blush: