https://github.com/kuangliu/pytorch-retinanet maybe this repo can solve your problem?
Thanks a lot @BowieHsu!
I also found this one is very good.
I ported this code to my program and it works.
I guess there is something wrong in the original code which breaks the computation graph and makes loss not decrease. I doubt it is this line:
pt = Variable(pred_prob_oh.data.gather(1, target.data.view(-1, 1)), requires_grad=True)
Is torch.gather support autograd? Is there anyway to implement this?
Many thanks!
@BowieHsu I used it in my own project, which has a multi-class, unbalanced data set. So far, it is not as good as it is mentioned in the paper. It is still very hard to train the data.
I also tried to use it in my own project, I found I had to reduce the lr by a factor of 10, leading to a better first iteration, but then due to the reduced lr, the precision over the epochs is barely improving. Maybe increasing lr after the first epoch could improve.
I tried it in my project, in FPN in faster rcnn, not as good as cross entropy though.
i haven’t read the paper in deatils. But I thought the the term (1-p)^gamma and p^gamma are for weighing only. They should not be back propagated during gradient descent. Maybe you need to detach() your variables?
after some checking, the weighing terms (1-p)^gamma and p^gamma are back propagated as well. you can refer to:
https://github.com/zimenglan-sysu-512/paper-note/blob/master/focal_loss.pdf
Hi Ben.
Have you confirmed that training with gamma=0 is same to with cross entropy loss?
I tried that in my implementation of focal loss. The result became very different
And I ask someone to answer my forum question. I can’t identify the problem.
I found this one is pretty good, except some small grammar issues in python3. Enjoy!
Thank you for helping me, Ben!!
I completed implmentation of focal loss for semantic segmentation.
you can find now one_hot and focal loss implementations in torchgeometry:
https://torchgeometry.readthedocs.io/en/latest/losses.html#torchgeometry.losses.one_hot
https://torchgeometry.readthedocs.io/en/latest/losses.html#torchgeometry.losses.FocalLoss
I’d like to add something to this, since it was leading me to an error.
The implementation suggested by @Ben (marvis/pytorch-yolo2/blob/master/FocalLoss.py) beware specify alpha as a (C,1)-shaped tensor rather than a (C,).
Otherwise, the implementation will be still working, but the loss will be computed as a dot product between the batch alpha values and the batch class probabilities, which makes conceptually no sense.
quick update: this can be found in kornia:
https://kornia.readthedocs.io/en/latest/losses.html#kornia.losses.FocalLoss
class FocalLoss(torch.nn.Module):
def __init__(self, gamma=2):
super(FocalLoss, self).__init__()
self.gamma = gamma
def forward(self, inputs, targets):
ps = F.softmax(inputs, dim=1)
ne_ps = 1 - ps
ws = torch.pow(ne_ps, self.gamma)
return F.nll_loss(ps.pow(ws), targets)
Can anyone give me comment on my implementation?
I was using it to solve classification problem with imbalance class
but when I checked the TensorBoard logging, the loss is always the same
I changed to nll_loss and the loss is changing now, but I am still looking forward to having others comments
How about the ignore_index apply to focal loss?
Just FYI: This comment here:
Sais that there is an implementation of focal loss in torchvision.
I took a look at the source of kornias non-binary focal loss implementation here. Could you please explain what is the purpose of alpha parameter here? As far as I can see this is a float constant and not a tensor, meaning that each class will be weighted with same float value.