How to implement focal loss in pytorch?

i haven’t read the paper in deatils. But I thought the the term (1-p)^gamma and p^gamma are for weighing only. They should not be back propagated during gradient descent. Maybe you need to detach() your variables?