I’m using multiple GPU for training ,But I found the low efficiency of my GPU usage,The GPU 0 is fully loaded,while the last one is nearly idle.I realize this fact by their RAM usage in the pic,Could anyone explain this phenomenon to me and tell me how to change this situation?thx! GPU RAM Usage

I tried to use your modified DataParallel but it failed. It gaves me the error: assert len(targets) == len(inputs) I do not know where to debug. My network returns two items, a list with tensor inside, a tensor. I also tried split the list before network return,it also failed.so as include t…

[image] DataParallel imbalanced memory usage Hi there @albanD , @Yuzhou_Song I noticed there is an small mistake with the code you provided: It’s necessary to unsqueeze loss inside forward pass to DataParallel were able to build loss back. Loss provided by PyTorch loss f…

so,what should I do?where and how to modify?

Do you mean to use the original nn.Dataparallel instead of the method mentioned in previous answer? and I noticed the final modification is the return torch.unsqueeze(loss,0) Am I right?

I mean that in that post the Imbalance is discussed, and I provide a working code to compute the loss inside the DataParallel as in the blog @Thomas_Wolf suggested. Just saying that, in case the code provided there does not work for you, you can try mine one. I didn’t checked the blog’s code.

Yeah，I got it finally ,thank u ! btw ,what is the purpose of returning the loss? why it’s being unsqueezed?

and how to set parameter tobe trained and the save process? the Fullmodel.parameter or model.parameter or parallel.parameter? meantime ,what to save ?

The trick is that if you compute the loss outside DataParallel, outputs are collected in a single GPU. Then loss is calculated and backpropagated. If you compute the loss inside DataParallel, instead of returning huge amount of data (that will be collected in a single gpu), you just return few floa…

TypeError: _gather(): incompatible function arguments. The following argument types are supported: 1. (tensors: List[at::Tensor], dim: int, destination_index: Optional[int]) -> at::Tensor Invoked with: (tensor([8.9125], device='cuda:0', grad_fn=<UnsqueezeBackward0>), tensor([7.4709], device='cu…

How to average the usage on multi GPU?

vision

JuanFMontesinos (Juan Montesinos) November 28, 2018, 1:59pm 4

Try this one

It’s roughly the same