Mse loss averaging over every pixel instead of batch

I am using the mse loss described here.

torch.nn.functional. mse_loss ( input , target , size_average=None , reduce=None , reduction=‘mean’ ) → Tensor

My input and target are of size [16, 2, 48, 120] i.e. a batch size of 16 where each item is a tensor of size [2, 48, 120].

Supplying the argument reduction="sum" returns tensor(3304.8472, grad_fn=<MseLossBackward>)
whereas the argument reduction="mean" returns tensor(0.0747, grad_fn=<MseLossBackward>)

This probably means that the method is averaging over the total number of pixel instead of averaging only by the batch size. What can I do to sum over every pixel then divide by the batch size ?

You could divide it manually by the batch size:

loss = F.mse_loss(output, target, reduction='sum') / output.size(0)

I thought of that but I wondered if it will have any effect on the gradients due to the division operation. I guess not, thanks !