How to use Focal Loss for an imbalanced data for binary classification problem?

I have been searching in GitHub, Google, and PyTorch forum but it doesn’t seem there is a training for using PyTorch-based focal loss for an imbalanced dataset for binary classification. Further, there has been so many variation of the said loss. Is there any standardized version of this loss given its effectiveness and popularity inside the newer PyTorch library itself? If not, the experts in the field, which open-source implementation of the “focal loss” for “binary classification” in PyTorch do you suggest?

Further, if 14% of my dataset is the positive class, how do I assign these weights to the said focal loss when using it?

and

^ The questions posed above are left without an answer unfortunately.

For example, here’s another implementation, that I found on GitHub

def focal_loss_lgb_eval_error(y_pred, dtrain, alpha, gamma):
    """
    Adapation of the Focal Loss for lightgbm to be used as evaluation loss

    Parameters:
    -----------
    y_pred: numpy.ndarray
        array with the predictions
    dtrain: lightgbm.Dataset
    alpha, gamma: float
        See original paper https://arxiv.org/pdf/1708.02002.pdf
    """
    a,g = alpha, gamma
    y_true = dtrain.label
    p = 1/(1+np.exp(-y_pred))
    loss = -( a*y_true + (1-a)*(1-y_true) ) * (( 1 - ( y_true*p + (1-y_true)*(1-p)) )**g) * ( y_true*np.log(p)+(1-y_true)*np.log(1-p) )
    return 'focal_loss', np.mean(loss), False

Here’s another answer from PyTorch Forum:

BCE_loss = F.binary_cross_entropy_with_logits(inputs, targets, reduction='none')
pt = torch.exp(-BCE_loss) # prevents nans when probability 0
F_loss = self.alpha * (1-pt)**self.gamma * BCE_loss
return focal_loss.mean()

I think as a suggestion would be best that this “focal loss” be in the PyTorch itself.

Hi @Mona_Jalal

Thanks for sharing your experience. Kindly I have a question about alpha value in this loss. The common value is 0.25. Is this actually the weight of classes with lower amount of data (i.e., positive class)? In the Pytorch implementation of the weighted BCE loss (i.e., F.binary_cross_entropy_with_logits) the positive class (class with lower sample) has higher weights, close to one for example 0.7. And 1-0,7 as the weight for negative class (the class with the higher number of samples).