How to split the training batch into different sizes before feeding into multi GPUs?

jia_lee · March 9, 2019, 2:31pm

Science the device whose index is 0 will hold the network parameters, how to split the training batches differently to maximize the usage of GPUs? In my case, the GPUs except the device[0] has low GPU-Util.
How to fix this problem? Thank you.

jia_lee · March 10, 2019, 3:26am

Any help is welcome.

vabh · March 10, 2019, 2:00pm

Hi,
Maybe you can use ‘nn.DataParallel’? https://pytorch.org/docs/stable/nn.html#dataparallel

jia_lee · March 11, 2019, 8:59am

Thank you for your advice. The problem is dividing the training samples into equal batches leading to unbalance GPU utility. The default GPU whose index is 0 usually has obvious more memory occupation. So, I think we should split the batches manually to maximize the utilities of multi GPUs.

zzsunshine · April 9, 2020, 11:26pm

Hi @jia_lee Have you solved this problem? Thanks.

jia_lee · April 13, 2020, 1:32am

Not totally. But I warped the loss into the network to alleviate this issue to some extent, look at this link:

github.com

jialee93/Improved-Body-Parts/blob/e9868bbaed2114022bb3094d4a4fa220932e98e3/train_parallel.py#L145




# target tensor shape: [8,512,512,3], [8, 1, 128,128], [8,43,128,128], [8,36,128,128], [8,36,128,128]
images, mask_misses, heatmaps = target_tuple  # , offsets, mask_offsets
# images = Variable(images)
# loc_targets = Variable(loc_targets)
# conf_targets = Variable(conf_targets)


optimizer.zero_grad()  # zero the gradient buff


loss_ngpu = posenet(target_tuple)  # reduce losses of all GPUs on cuda 0
loss = torch.sum(loss_ngpu) / opt.batch_size
# print(loc_preds.requires_grad)
# print(conf_preds.requires_grad)
if loss.item() > 1e6:
    print("\nLoss is abnormal, drop this batch !")
    loss.zero_()
    continue
# print(loss.requires_grad)
loss.backward()  # retain_graph=True
# torch.nn.utils.clip_grad_norm(posenet.parameters(), args.max_grad_norm)
optimizer.step()  # TODO：可以使用累加的loss变相增大batch size，但对于bn层需要减少默认的momentum

Alternatively, I recommend to use distributed training.

zzsunshine · April 13, 2020, 5:39am

Thanks for your quick response. Yeah, improving the loss is one way to improve the first GPU memory utilization, however, i am not sure if it helps too much. I would prefer to split the batches manually.