Failing to Replicate sklearn LogisticRegression in PyTorch. Any help?

BlakeWest · August 5, 2019, 12:11am

Hi, I’ve been using sklearn for a while in a personal project, and it’s generally been very good. I’ve built a model that works pretty well using their built in LogisticRegression. However I know there are some features that are not linear responses. And I also know generally how to use PyTorch so wanted to try out incorporating that into my project. Ideally, I’m looking to run what I believe are the non-linear features through some shallow NN’s, and then concatenate those with the linear features into a basic LogisticRegression model. One key thing I need is for the probabilities to be well-calibrated. NN’s typically don’t provide this. But sklearn’s LogisticRegression does this extremely well.
So! I want to replicate sklearn’s logistic regression in PyTorch, and then build the model described above in an end-to-end fashion. However, I can’t seem to get it to replicate, and I’m hoping I can get some advice. If you look at the sklearn docs, they show the loss function, and and from the source code, they appear to use LibLinear, and are essentially optimizing an NLLLoss with an L2 penalty. Seemed easy enough with PyTorch, but no luck!

Here’s what I’ve tried…

class DeepLogisticRegression(nn.Module):
    def __init__(self, num_in):
        super().__init__()
        output_units = 2
        self.linear = nn.Linear(num_in, output_units)
        self.sigmoid = nn.Sigmoid()
        self.sequential = nn.Sequential(self.linear, self.sigmoid)

    def forward(self, X):
        return self.sequential(X.float())
# Optimizer = Adagrad(weight_decay=2, lr = 0.001, batch_size=4096)

Note I’m fitting the above using Adagrad, and the weight decay is meant to replicate sklearn’s L2 penalty. I run this for about 30 epochs. I’ve tried various configs of the params, including SGD, different weight decay, different lr, etc. But my PyTorch version always ends up just getting to a local minima, where it essentially always picks the class that is slightly more common (62.26% of total samples). I can’t seem to get it to actually “train”, and find a legit model. But sklearn’s implementation has no problem with this.

Any ideas or thoughts would be so much appreciated! Thanks you thank you!

PS: I guess theoretically if I could create transformations of my non-linear features and just hand those directly into an sklearn LogisticRegression, and train the whole thing end to end, that would be sweet too. I don’t think that’s possible though?

travellingbones · January 5, 2023, 8:21pm

I’m having the same problem using L1 regularization on the weighted Adult Dataset. SciKit implementation gets ~80%AUC using liblinear optimizer, but I’m struggling to get over 70% in most runs w/ PyTorch with various optimization settings.