I don’t know what model you’re using so it’s hard for me to help you, but to find the problem I would probably change the batch size to 1 and see what happens, just to make sure that the model’s output size is really 1.
It is too late to answer, but I leave the answer here for other people.
I recently had the same problem for multi-gpu usage.
Problem: It seems like the problem is that we need to pass Tensor-type variable to model.forward for DataParallel.
Otherwise, it is not going to be scattered, but copied as the number of gpu.
Solution: easy solution is to pass Tensor variable to model.forward, instead of type casting in the forward function.
Another solution is to return loss from the forward function, then normalize loss by the batch size.