i haven’t read the paper in deatils. But I thought the the term (1-p)^gamma and p^gamma are for weighing only. They should not be back propagated during gradient descent. Maybe you need to detach() your variables?
i haven’t read the paper in deatils. But I thought the the term (1-p)^gamma and p^gamma are for weighing only. They should not be back propagated during gradient descent. Maybe you need to detach() your variables?