((x - x_hat)**2).sum()/len(x) vs nn.functional.mse_loss(x_hat, x)

No, these approaches are not the same depending on the tensor shape.
As described in the docs F.mse_loss will calculate mean(L) by default, which would correspond to a division by the number of elements, not the batch size.

x = torch.randn(10, 10)
x_hat = torch.randn(10, 10)

print(F.mse_loss(x_hat, x))
# tensor(1.7013)
print(((x - x_hat)**2).sum()/ len(x))
# tensor(17.0125)
print(((x - x_hat)**2).sum()/ x.nelement())
# tensor(1.7013)
print(((x - x_hat)**2).mean())
# tensor(1.7013)
1 Like