How to split the training batch into different sizes before feeding into multi GPUs?

Science the device whose index is 0 will hold the network parameters, how to split the training batches differently to maximize the usage of GPUs? In my case, the GPUs except the device[0] has low GPU-Util.
Maybe you can use ‘nn.DataParallel’?

Thank you for your advice. The problem is dividing the training samples into equal batches leading to unbalance GPU utility. The default GPU whose index is 0 usually has obvious more memory occupation. So, I think we should split the batches manually to maximize the utilities of multi GPUs.

Hi @jia_lee Have you solved this problem? Thanks.

Not totally. But I warped the loss into the network to alleviate this issue to some extent, look at this link:

Alternatively, I recommend to use distributed training.

Thanks for your quick response. Yeah, improving the loss is one way to improve the first GPU memory utilization, however, i am not sure if it helps too much. I would prefer to split the batches manually.:blush: