Multi GPU issue: data on different gpu or weird memory assignment

Jingru_Yi · September 12, 2018, 9:53am

When I use multi-gpus to train my model ( model=torch.nn.DataParallel(model) model=model.to(device)), where I have to create some gpu tensors in the loss function, then I find the new gpu tensors have to be on the same gpu as the model’s, otherwise ‘data on different gpus’ will appear. However, if I put all the additional tensors on the same gpu, the weird memory assignment occurs, like gpu0 is 10000M, while other gpu is 3000M. I have to shrink my batch size to enable the training. Is there any solution?