((x - x_hat)**2).sum()/len(x) vs nn.functional.mse_loss(x_hat, x)

Consider the following training code for Autoencoders

def train(model, data, epoch, lr=0.001):
    opt = torch.optim.Adam(model.parameters(), lr)
    train_loss = [0] * epoch
    for i in range(epoch):
        for x, y in data:
            x = x.to(device)
            opt.zero_grad()
            x_hat = model(x)
            loss = nn.functional.mse_loss(x_hat, x)
#             loss = ((x - x_hat)**2).sum()/len(x)
            loss.backward()
            opt.step()
            train_loss[i] += (loss * len(x))
        train_loss[i] /= len(data.dataset)
        print(f'Epoch: {i+1}/{epoch}, loss: {train_loss[i]}')

Why do I get different value when I use loss = ((x - x_hat)**2).sum()/len(x) and loss =
nn.functional.mse_loss(x_hat, x), isnt both the same?

No, these approaches are not the same depending on the tensor shape.
As described in the docs F.mse_loss will calculate mean(L) by default, which would correspond to a division by the number of elements, not the batch size.

x = torch.randn(10, 10)
x_hat = torch.randn(10, 10)

print(F.mse_loss(x_hat, x))
# tensor(1.7013)
print(((x - x_hat)**2).sum()/ len(x))
# tensor(17.0125)
print(((x - x_hat)**2).sum()/ x.nelement())
# tensor(1.7013)
print(((x - x_hat)**2).mean())
# tensor(1.7013)
1 Like