This may not be correct if you have class weight, since the weight is attached before multiplying by (1-p)**gamma. I think @arkrde 's answer works. In pytorch, F.cross_entropy is equal to F.nll_loss(F.log_softmax). The weight (i.e. the alpha) should be attached on nll_loss().