I get following warning when running pytorch 0.4.1 with python 3.6 on two K80 GPU
/python3.6/site-packages/torch/nn/parallel/_functions.py:58: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.
I have a vector of loss returned. 2 GPU returns 2 losses. For now, I just use torch.mean().
Is this the right thing to do?
Ideally that should be fine.
You seem to be using dataparallel and MSELoss inside your model definition. Is that right or Am I missing something?
What if you use MSE Loss once after you collect the output from the forward pass?
You are correct. I acutally need the loss inside my model, since it seems like I cannot compute the gradient after nn.MSELoss.
The result is similar, so I think the warning is specifically due to my achitecture.
Yes. It is due to your architecture. If you really want to avoid the warning, you should keep the loss function outside the model.
Hi! I am having the same problem, but I am using a PyTorch built-in model and thus unable to change its architecture. I’m curious about what would happen if I ignore this warning, which is how will the squeeze operation influence the loss, the training process, and the final precision. Or, what should I do to fix this? Maybe running multiple models? I would really appreciate it if you could reply!
To my best knowledge, it wont affect the result as long as the batch size distributed for each GPU is the same. Otherwise, you need to weighted average the loss. The precision should not be a big issue.