Focal loss for imbalanced multi class classification in Pytorch

I want an example code for Focal loss in PyTorch for a model with three class prediction. My model outputs 3 probabilities.

Sentiment_LSTM(
(embedding): Embedding(19612, 400)
(lstm): LSTM(400, 512, num_layers=2, batch_first=True, dropout=0.5)
(dropout): Dropout(p=0.5, inplace=False)
(fc): Linear(in_features=512, out_features=3, bias=True)
(sig): Sigmoid() )

My class distribution is highly imbalanced. So I want to try focal loss so that the minor class accuracy is improved.

I currently used loss function defined in https://www.kaggle.com/c/tgs-salt-identification-challenge/discussion/65938 But it didnā€™t help.

The original paper(https://arxiv.org/abs/1708.02002) only consider binary classification. How do I extend it to the multi-class scenario?

1 Like

I donā€™t think you would want sigmoid for multi-class (Iā€™m assuming you mean multi-class rather than multi-label and already train with (unfocused - ha!) cross entropy loss).
If your regular cross entropy loss is ā€œce_lossā€, you can just define alpha and gamma and do as in the linked function

ce_loss = torch.nn.functional.cross_entropy(outputs, targets, reduction='none') # important to add reduction='none' to keep per-batch-item loss
pt = torch.exp(-ce_loss)
focal_loss = (alpha * (1-pt)**gamma * ce_loss).mean() # mean over the batch

Best regards

Thomas

6 Likes

Here is my network def: I am not usinf the sigmoid layer as cross entropy takes care of it. so I pass the raw logits to the loss function

import torch.nn as nn

class Sentiment_LSTM(nn.Module):
    """
    We are training the embedded layers along with LSTM for the sentiment analysis
    """

    def __init__(self, vocab_size, output_size, embedding_dim, hidden_dim, n_layers, drop_prob=0.5):
        """
        Settin up the parameters.
        """
        super(Sentiment_LSTM, self).__init__()

        self.output_size = output_size
        self.n_layers = n_layers
        self.hidden_dim = hidden_dim
        
        # embedding layer and LSTM layers 
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, n_layers, 
                            dropout=drop_prob, batch_first=True)
        
        # dropout layer to avoida over fitting
        self.dropout = nn.Dropout(0.5)
        
        # linear and sigmoid layers
        self.fc = nn.Linear(hidden_dim, output_size)
        self.sig = nn.Sigmoid()
        

    def forward(self, x):
        """
        Perform a forward pass

        """
        batch_size = x.size(0)

        x = x.long()
        embeds = self.embedding(x)

        lstm_out, hidden = self.lstm(embeds)

    
        # stack up lstm outputs
        lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim)


        out = self.dropout(lstm_out)
        out = self.fc(out)

        # sigmoid function
        sig_out = out
        
        # reshape to be batch_size first
        sig_out = sig_out.view(batch_size, -1,3)
        #print("sig_out",sig_out.shape)
        sig_out = sig_out[:, -1,:] # get last batch of labels
        
        # return last sigmoid output and hidden state
        return sig_out
    
    
    def init_hidden(self, batch_size):
        #initilizing hidden layers
        weight = next(self.parameters()).data
        
        if (train_on_gpu):
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda(),
                  weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda())
        else:
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())
        
        return hidden

My loss function:

class FocalLoss(nn.Module):
    def __init__(self, alpha=1, gamma=2, logits=False, reduce=True):
        super(FocalLoss, self).__init__()
        self.alpha = alpha
        self.gamma = gamma
        self.logits = logits
        self.reduce = reduce

    def forward(self, inputs, targets):nn.CrossEntropyLoss()
    
        BCE_loss = nn.CrossEntropyLoss()(inputs, targets, reduce=False)

        pt = torch.exp(-BCE_loss)
        F_loss = self.alpha * (1-pt)**self.gamma * BCE_loss

        if self.reduce:
            return torch.mean(F_loss)
        else:
            return F_loss

My output has size of 3: It has to predict if the sentiment is positive , negative or neutral.
I data is imblanced. Neutral is around 7000, positve around 250 and negative around 800. Is my understanding and the implementation makes sense?

1 Like

This doesnā€™t look right.

  • You probably just want the functional version (see above) and pass reduction='none' to be modern.
  • Iā€™d not call it BCE_loss.

My impression is that focal loss may help, but there are quite a few ways to do this, the most simple one is balanced sampling during training and a recent one is the weighted loss function from [1901.05555] Class-Balanced Loss Based on Effective Number of Samples .

Best regards

Thomas

2 Likes

Can you answer this aswell?

whatā€™s the value of alpha here?

1 Like

alpha is an additional weighting factor between classes. In the paper linked above it is introduced in eq (5).

Best regards

Thomas

Why we need use the below line in focal loss, as per paper, pt = p if y==1, otherwise 1-p

torch.exp(-ce_loss)

Why torch.exp ?? What does it reflect here

The ce_loss is a negative log likelihood and so torch.exp(-ce_loss) is the likelihood (i.e. between 0 and 1 etc.).

I got it ,
but why do we need torch.exp here
Could you please clarify it with a simple example

Iā€™m not sure I understand? torch.exp of a log likelihood gives you the likelihood because exp is the inverse operation of log.

alpha is really a hard hyper-parameters怂怂怂

alpha canā€™t be the balanced factor as the paper, itā€™s just the scaled factor, right?

Would it be correct to use alpha already in the cross_entropy calculation as weight like this?

ce_loss = torch.nn.functional.cross_entropy(outputs, targets, reduction='none', weight=alpha) 
pt = torch.exp(-ce_loss)
focal_loss = ((1-pt)**gamma * ce_loss).mean()

@MaxWolf-01 I was wondering about the same idea. Iā€™ve tried that, and I am getting unstable loss values (keep fluctuating between extremes). Without using class weights (i.e., weight=None), the loss values are stable, but Focal loss overfits in comparison to nn.CrossEntropyLoss with class weights in that case.

@tom Any ideas here?

Here is the implementation:

class FocalLoss(torch.nn.Module):
ā€œā€"Implementation of the Focal loss function

    Args:
        weight: class weight vector to be used in case of class imbalance
        gamma: hyper-parameter for the focal loss scaling.
"""
def __init__(self, weight=None, gamma=2):
    super(FocalLoss, self).__init__()
    self.gamma = gamma
    self.weight = weight #weight parameter will act as the alpha parameter to balance class weights

def forward(self, outputs, targets):
    ce_loss = torch.nn.functional.cross_entropy(outputs, targets, reduction='none', weight=self.weight) 
    pt = torch.exp(-ce_loss)
    focal_loss = ((1-pt)**self.gamma * ce_loss).mean() # mean over the batch
    return focal_loss