A really strange phenomenon

this is my code:

probs = (P*class_mask) 
pt = torch.pow(1-probs, 0)
logmean  = (probs.log()).sum(1).view(-1,1)
logmean  = (pt*probs.log()).sum(1).view(-1,1)

the two loss function above with pt or not have different test accuracy, the first function always have a higher acc than the second. I have tried many times .But in my opinion, due to the pt is identically equal to 1 and has been detached,pt will not have an impact on gradients. is there anything that I understand wrongly?thanks