How to implement focal loss in pytorch?

Ben · August 20, 2017, 2:27pm

I implemented multi-class Focal Loss in pytorch. Bellow is the code. log_pred_prob_onehot is batched log_softmax in one_hot format, target is batched target in number(e.g. 0, 1, 2, 3).

class FocalLoss(torch.nn.Module):
    def __init__(self, gamma=2):
        super().__init__()
        self.gamma = gamma

    def forward(self, log_pred_prob_onehot, target):
        pred_prob_oh = torch.exp(log_pred_prob_onehot)
        pt = Variable(pred_prob_oh.data.gather(1, target.data.view(-1, 1)), requires_grad=True)
        modulator = (1 - pt) ** self.gamma
        mce = modulator * (-torch.log(pt))

        return mce.mean()

However, when I tested it, it worked poorly. I read the Focal Loss paper a couple of times. It seems straight. Maybe I didn’ t understand it very well. I’d appreciate if anybody can correct me! Or if there is a workable implementation, please let me know! Thanks in advance!

Ben

BowieHsu · August 21, 2017, 11:22am

https://github.com/kuangliu/pytorch-retinanet maybe this repo can solve your problem?

Ben · August 22, 2017, 7:15am

Thanks a lot @BowieHsu!

I also found this one is very good.

github.com

DingKe/pytorch_workplace/blob/master/focalloss/loss.py

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable


def one_hot(index, classes):
    size = index.size() + (classes,)
    view = index.size() + (1,)

    mask = torch.Tensor(*size).fill_(0)
    index = index.view(*view)
    ones = 1.

    if isinstance(index, Variable):
        ones = Variable(torch.Tensor(index.size()).fill_(1))
        mask = Variable(mask, volatile=index.volatile)

    return mask.scatter_(1, index, ones)

This file has been truncated. show original

I ported this code to my program and it works.

Ben · August 22, 2017, 7:18am

I guess there is something wrong in the original code which breaks the computation graph and makes loss not decrease. I doubt it is this line:

    pt = Variable(pred_prob_oh.data.gather(1, target.data.view(-1, 1)), requires_grad=True)

Is torch.gather support autograd? Is there anyway to implement this?
Many thanks!

BowieHsu · August 26, 2017, 9:10am

@ben Hi, ben,
have you tried focal loss on SSD or FasteRcnn?
How much mAP will be improved?

Ben · August 29, 2017, 7:28am

@BowieHsu I used it in my own project, which has a multi-class, unbalanced data set. So far, it is not as good as it is mentioned in the paper. It is still very hard to train the data.

trypag · September 1, 2017, 8:26am

I also tried to use it in my own project, I found I had to reduce the lr by a factor of 10, leading to a better first iteration, but then due to the reduced lr, the precision over the epochs is barely improving. Maybe increasing lr after the first epoch could improve.

xwy · September 25, 2017, 6:56am

I tried it in my project, in FPN in faster rcnn, not as good as cross entropy though.

Hengck · October 5, 2017, 4:47am

i haven’t read the paper in deatils. But I thought the the term (1-p)^gamma and p^gamma are for weighing only. They should not be back propagated during gradient descent. Maybe you need to detach() your variables?

Hengck · October 7, 2017, 4:08am

after some checking, the weighing terms (1-p)^gamma and p^gamma are back propagated as well. you can refer to:

https://github.com/zimenglan-sysu-512/paper-note/blob/master/focal_loss.pdf

Kento_Doi · December 22, 2017, 2:08am

Hi Ben.
Have you confirmed that training with gamma=0 is same to with cross entropy loss?
I tried that in my implementation of focal loss. The result became very different

And I ask someone to answer my forum question. I can’t identify the problem.

Ben · January 3, 2018, 9:26am

I found this one is pretty good, except some small grammar issues in python3. Enjoy!

github.com

marvis/pytorch-yolo2/blob/master/FocalLoss.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# --------------------------------------------------------
# Licensed under The MIT License [see LICENSE for details]
# Written by Chao CHEN (chaochancs@gmail.com)
# Created On: 2017-08-11
# --------------------------------------------------------
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

class FocalLoss(nn.Module):
    r"""
        This criterion is a implemenation of Focal Loss, which is proposed in 
        Focal Loss for Dense Object Detection.
            
            Loss(x, class) = - \alpha (1-softmax(x)[class])^gamma \log(softmax(x)[class])
    
        The losses are averaged across observations for each minibatch.

This file has been truncated. show original

Kento_Doi · February 26, 2018, 10:57am

Thank you for helping me, Ben!!

I completed implmentation of focal loss for semantic segmentation.

github.com

doiken23/pytorch_toolbox/blob/master/focalloss2d.py

####################################################
##### This is focal loss class for multi class #####
##### University of Tokyo Doi Kento            #####
####################################################

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
# I refered https://github.com/c0nn3r/RetinaNet/blob/master/focal_loss.py

class FocalLoss2d(nn.Module):

    def __init__(self, gamma=0, weight=None, size_average=True):
        super(FocalLoss2d, self).__init__()

        self.gamma = gamma
        self.weight = weight
        self.size_average = size_average

This file has been truncated. show original

edgarriba · April 1, 2019, 3:28pm

you can find now one_hot and focal loss implementations in torchgeometry:
https://torchgeometry.readthedocs.io/en/latest/losses.html#torchgeometry.losses.one_hot
https://torchgeometry.readthedocs.io/en/latest/losses.html#torchgeometry.losses.FocalLoss

mattsim · June 11, 2019, 11:16am

I’d like to add something to this, since it was leading me to an error.
The implementation suggested by @Ben (marvis/pytorch-yolo2/blob/master/FocalLoss.py) beware specify alpha as a (C,1)-shaped tensor rather than a (C,).
Otherwise, the implementation will be still working, but the loss will be computed as a dot product between the batch alpha values and the batch class probabilities, which makes conceptually no sense.

edgarriba · October 22, 2019, 9:22pm

quick update: this can be found in kornia:
https://kornia.readthedocs.io/en/latest/losses.html#kornia.losses.FocalLoss

edwardpwtsoi · November 1, 2019, 3:46am

class FocalLoss(torch.nn.Module):
    def __init__(self, gamma=2):
        super(FocalLoss, self).__init__()
        self.gamma = gamma

    def forward(self, inputs, targets):
        ps = F.softmax(inputs, dim=1)
        ne_ps = 1 - ps
        ws = torch.pow(ne_ps, self.gamma)
        return F.nll_loss(ps.pow(ws), targets)

Can anyone give me comment on my implementation?
I was using it to solve classification problem with imbalance class
~~but when I checked the TensorBoard logging, the loss is always the same~~
I changed to nll_loss and the loss is changing now, but I am still looking forward to having others comments

Devraj_Mandal · November 14, 2019, 2:08pm

@edwardpwtsoi I think if u are using NLL loss then u have to do logsoftmax.

name_space · January 6, 2020, 3:54pm

How about the ignore_index apply to focal loss?

PhilipMay · December 22, 2020, 8:39am

Just FYI: This comment here:

Sais that there is an implementation of focal loss in torchvision.