Biased logistic regression

John_Grace · May 4, 2022, 12:19am

I am building a binary classifier in python.
My model has decently high AUC=90%, but is biased, underestimating the probability $y=1$. This is systematic across some of the input features as well. How can I nudge the bias term, or otherwise address this issue? I am surprised that the model ends up being biased, despite it having an intercept (bias term).
Perhaps there’s a way to tell it after training “now scale up the bias term so as to match the sum of predicted probabilities (y_hat) with the sum of the actual (real) ys”
Or how would I address a bias issue like this?
My dataset is very unbalanced, only 3% positives (1s) vs 97% 0s. But in y_hat, the number of 1s is closer to 2.5%.

eqy · May 4, 2022, 3:40am

Are you weighting the loss function in some way? e.g., via the weight parameter? BCELoss — PyTorch 1.11.0 documentation