Consider the following training code for Autoencoders

```
def train(model, data, epoch, lr=0.001):
opt = torch.optim.Adam(model.parameters(), lr)
train_loss = [0] * epoch
for i in range(epoch):
for x, y in data:
x = x.to(device)
opt.zero_grad()
x_hat = model(x)
loss = nn.functional.mse_loss(x_hat, x)
# loss = ((x - x_hat)**2).sum()/len(x)
loss.backward()
opt.step()
train_loss[i] += (loss * len(x))
train_loss[i] /= len(data.dataset)
print(f'Epoch: {i+1}/{epoch}, loss: {train_loss[i]}')
```

Why do I get different value when I use loss = ((x - x_hat)**2).sum()/len(x) and loss =

nn.functional.mse_loss(x_hat, x), isnt both the same?