Hello,

I have been trying to go through all of the loss functions in PyTorch and build them from scratch to gain a better understanding of them and I’ve run into what is either an issue with my recreation, or an issue with PyTorch’s implementation.

According to Pytorch’s documentation for SmoothL1Loss it simply states that if the absolute value of the prediction minus the ground truth is less than beta, we use the top equation. Otherwise, we use the bottom one. Please see documentation for the equations.

Below is my implementation of this in the form of a minimum test:

```
import torch
import torch.nn as nn
import numpy as np
predictions = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
def l1_loss_smooth(predictions, targets, beta = 1.0):
loss = 0
for x, y in zip(predictions, targets):
if abs(x-y).mean() < beta:
loss += (0.5*(x-y)**2 / beta).mean()
else:
loss += (abs(x-y) - 0.5 * beta).mean()
loss = loss/predictions.shape[0]
output = l1_loss_smooth(predictions, target)
print(output)
```

Gives an output of:

`tensor(0.7475, grad_fn=<DivBackward0>)`

Now the Pytorch implementation:

```
loss = nn.SmoothL1Loss(beta=1.0)
output = loss(predictions, target)
```

Gives an output of:

`tensor(0.7603, grad_fn=<SmoothL1LossBackward>)`

I can’t figure out where the error in implementation lies.

Upon looking a little deeper into the `smooth_l1_loss`

function in the `_C`

module (file: `smooth_c_loss_op.cc`

) I noticed that the doc string mentions that it’s a variation on Huber Loss but the documentation for `SmoothL1Loss`

says it is huber loss.

So overall, just confused on how it’s implemented and whether it’s a combo of SmoothL1Loss and Huber Loss, Just Huber Loss, or something else.

Thanks!