this is my code:
probs = (P*class_mask)
pt = torch.pow(1-probs, 0)
pt.detach_()
logmean = (probs.log()).sum(1).view(-1,1)
logmean = (pt*probs.log()).sum(1).view(-1,1)
the two loss function above with pt or not have different test accuracy, the first function always have a higher acc than the second. I have tried many times .But in my opinion, due to the pt is identically equal to 1 and has been detached,pt will not have an impact on gradients. is there anything that I understand wrongly?thanks