BCE with pos_weight will never have 0 gradient?

I want to use BCEWithLogitsLoss with a pos_weight value > 1 to deal with class imbalance in my dataset. However, one thing I noticed was that the pos_weight aspect makes the loss always have a nonzero gradient.

L = - pos_weight * y * log(yhat) - (1-y) * log(1 - yhat)
dL/dyhat = -pos_weight * y / yhat + (1 - y) / (1 - yhat)

Even if my model perfectly predicts yhat=y, then my gradient is -pos_weight yhat/yhat + (1-yhat) / (1-yhat) = -pos_weight + 1 < 0 for pos_weight > 0.

This seems odd, because it suggests that the model will continue to make (potentially large) updates even if we’re already making correct predictions. Am I misunderstanding something here?