No, these approaches are not the same depending on the tensor shape.
As described in the docs F.mse_loss
will calculate mean(L)
by default, which would correspond to a division by the number of elements, not the batch size.
x = torch.randn(10, 10)
x_hat = torch.randn(10, 10)
print(F.mse_loss(x_hat, x))
# tensor(1.7013)
print(((x - x_hat)**2).sum()/ len(x))
# tensor(17.0125)
print(((x - x_hat)**2).sum()/ x.nelement())
# tensor(1.7013)
print(((x - x_hat)**2).mean())
# tensor(1.7013)