Incorrect Smooth L1 Loss?

DKilkenny · December 15, 2020, 3:39am

Hello,

I have been trying to go through all of the loss functions in PyTorch and build them from scratch to gain a better understanding of them and I’ve run into what is either an issue with my recreation, or an issue with PyTorch’s implementation.

According to Pytorch’s documentation for SmoothL1Loss it simply states that if the absolute value of the prediction minus the ground truth is less than beta, we use the top equation. Otherwise, we use the bottom one. Please see documentation for the equations.

Below is my implementation of this in the form of a minimum test:

import torch
import torch.nn as nn
import numpy as np

predictions = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)

def l1_loss_smooth(predictions, targets, beta = 1.0):
    
    loss = 0
    for x, y in zip(predictions, targets):
        if abs(x-y).mean() < beta:
            loss += (0.5*(x-y)**2 / beta).mean()
        else:
            loss += (abs(x-y) - 0.5 * beta).mean()

    loss = loss/predictions.shape[0]

output = l1_loss_smooth(predictions, target)
print(output)

Gives an output of:
tensor(0.7475, grad_fn=<DivBackward0>)

Now the Pytorch implementation:

loss = nn.SmoothL1Loss(beta=1.0)
output = loss(predictions, target)

Gives an output of:
tensor(0.7603, grad_fn=<SmoothL1LossBackward>)

I can’t figure out where the error in implementation lies.

Upon looking a little deeper into the smooth_l1_loss function in the _C module (file: smooth_c_loss_op.cc) I noticed that the doc string mentions that it’s a variation on Huber Loss but the documentation for SmoothL1Loss says it is huber loss.

So overall, just confused on how it’s implemented and whether it’s a combo of SmoothL1Loss and Huber Loss, Just Huber Loss, or something else.

Thanks!

liangbright · April 21, 2021, 2:06am

import torch
from torch.nn.functional import smooth_l1_loss, l1_loss
y_pred=torch.tensor(1.0)
y_true=torch.tensor(1.12)

loss1=smooth_l1_loss(y_pred, y_true, 1e-2, reduction = 'sum')
loss2=smooth_l1_loss(y_pred, y_true, 0, reduction = 'sum')
loss3=l1_loss(y_pred, y_true, reduction = 'sum')
print((y_pred-y_true).abs().item(), loss1.item(), loss2.item(), loss3.item())

Output is
0.12000000476837158
0.007200000341981649
0.007200000341981649
0.12000000476837158

weird

ptrblck · April 21, 2021, 10:56pm

The third argument to smooth_l1_loss is the size_average, so you would have to specify this argument via beta=1e-2 and beta=0.0, which will then give the same loss output as the initial custom code:

y_pred=torch.tensor(1.0)
y_true=torch.tensor(1.12)

loss1=smooth_l1_loss(y_pred, y_true, beta=1e-2, reduction = 'mean')
loss2=smooth_l1_loss(y_pred, y_true, beta=0, reduction = 'mean')
loss3=l1_loss(y_pred, y_true, reduction = 'mean')
print((y_pred-y_true).abs().item(), loss1.item(), loss2.item(), loss3.item())
> 0.12000000476837158 0.11500000208616257 0.12000000476837158 0.12000000476837158

def l1_loss_smooth(predictions, targets, beta = 1e-2):
    
    loss = 0
    for x, y in zip(predictions, targets):
        if abs(x-y) < beta:
            loss += (0.5*(x-y)**2 / beta).mean()
        else:
            loss += (abs(x-y) - 0.5 * beta).mean()

    loss = loss/predictions.shape[0]
    return loss

output = l1_loss_smooth(y_pred.unsqueeze(0), y_true.unsqueeze(0), beta=1e-2)
print(output)
> tensor(0.1150)

output = l1_loss_smooth(y_pred.unsqueeze(0), y_true.unsqueeze(0), beta=0.0)
print(output)
> tensor(0.1200)