I’ve been working on an unbalanced binary classification problem, where true to false ratio is 9:1, and my input is 20 dim tabular data. In order to handle this imbalanced dataset, I decided to use Focal loss. My implementation of this metric is a PyTorch adaptation of the Tensorflow one.

I made a couple of tests and the outcome of my focal loss implementation is the same as the one produced by the TensorFlow one.

```
class FocalLoss(nn.Module):
"""
Weighs the contribution of each sample to the loss based in the classification error.
:gamma: Focusing parameter. gamma=0 is equivalent to BCE_loss
"""
def __init__(self, gamma, eps=1e-6):
super(FocalLoss, self).__init__()
self.gamma = gamma
def forward(self, y_pred, y_true):
y_true = y_true.float()
pred_prob = torch.sigmoid(y_pred)
ce = nn.BCELoss(reduce=False)(pred_prob,y_true)
p_t = (y_true*pred_prob)+((1-y_true)*(1-pred_prob))
modulator = 1.0
if self.gamma:
modulator = torch.pow((1.0-p_t),torch.tensor(self.gamma).to(device) )
return torch.mean(modulator*ce)
```

My model is a simple n hidden layers, in this test n=4, fully connected NN with Relu activation. This architecture works reasonably fine when cross-entropy is used as a loss function.

The figure below presents a gradient flow after one epoch. The value of validation loss approached 7e-12 and the validation accuracy is 50%.

I am using adam optimizer with lr=1e-4.

What do you think about my implementation of the Focal loss? Is it legit?

I am also using a class balanced sampler, therefore each batch contains an equal number of true and false examples.