From the source code of
torch.nn.MSELoss you can see that the class is a wrapper for
torch.nn.functional.mse_loss. The relevant part of the code for this one is (source code link):
if size_average is not None or reduce is not None:
reduction = _Reduction.legacy_get_string(size_average, reduce)
ret = (input - target) ** 2
if reduction != 'none':
ret = torch.mean(ret) if reduction == 'mean' else torch.sum(ret)
expanded_input, expanded_target = torch.broadcast_tensors(input, target)
ret = torch._C._nn.mse_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
So, as you can see if target requires gradient the operations are exactly the same as your code (hence, the gradient is the same).
If target does not require gradient the C module implementation of MSE is used. I don’t know how it is implemented in C module but I would say that it is the same calculation. Anyway, just wait for someone more informed about this.