Difference between nn.MSELoss and torch.mean((op-target)**2)

acobobby · June 24, 2019, 5:08pm

From the source code of torch.nn.MSELoss you can see that the class is a wrapper for torch.nn.functional.mse_loss. The relevant part of the code for this one is (source code link):

if size_average is not None or reduce is not None:
    reduction = _Reduction.legacy_get_string(size_average, reduce)
if target.requires_grad:
    ret = (input - target) ** 2
    if reduction != 'none':
        ret = torch.mean(ret) if reduction == 'mean' else torch.sum(ret)
else:
    expanded_input, expanded_target = torch.broadcast_tensors(input, target)
    ret = torch._C._nn.mse_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
    return ret

So, as you can see if target requires gradient the operations are exactly the same as your code (hence, the gradient is the same).
If target does not require gradient the C module implementation of MSE is used. I don’t know how it is implemented in C module but I would say that it is the same calculation. Anyway, just wait for someone more informed about this.