How to implement focal loss in pytorch?

Hengck · October 5, 2017, 4:47am

i haven’t read the paper in deatils. But I thought the the term (1-p)^gamma and p^gamma are for weighing only. They should not be back propagated during gradient descent. Maybe you need to detach() your variables?