I get following warning when running pytorch 0.4.1 with python 3.6 on two K80 GPU
/python3.6/site-packages/torch/nn/parallel/_functions.py:58: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.
I have a vector of loss returned. 2 GPU returns 2 losses. For now, I just use torch.mean().
Is this the right thing to do?
You are correct. I acutally need the loss inside my model, since it seems like I cannot compute the gradient after nn.MSELoss.
The result is similar, so I think the warning is specifically due to my achitecture.
Hi! I am having the same problem, but I am using a PyTorch built-in model and thus unable to change its architecture. Iām curious about what would happen if I ignore this warning, which is how will the squeeze operation influence the loss, the training process, and the final precision. Or, what should I do to fix this? Maybe running multiple models? I would really appreciate it if you could reply!
To my best knowledge, it wont affect the result as long as the batch size distributed for each GPU is the same. Otherwise, you need to weighted average the loss. The precision should not be a big issue.