Logistic regression implemented using pytorch performs worse than sklearn's logistic regression

Hi, I implemented binary logistic regression using pytorch with one linear layer, one sigmoid layer, and optimized using BCELoss and Adam optimizer. However, it performs worse than sklearn’s implementation of logistic regression with liblinear. More, specifically, as the dimension of sample grows, pytorch’s implementation becomes unstable and seems to be trapped in a local minimum while sklearn performs fine. Is this because pytorch’s optimization algorithm being different from sklearn? In fact adding another layer does not help with the performance. Since I really want to take advantage of pytorch’s GPU support and the ability of building deep nets, I am wondering if anyone has encountered this issue before and how can we get pytorch work as well as sklearn. Thanks

1 Like

The comparison between liblinear and your custom PyTorch model might be quite hard.
Did you use all parameters and methods from liblinear in your PyTorch model?

Yes, I encountered similar issue.
My analysis on one month twitter sentiment analysis data (around 1800 samples), got nearly 65 to 72% accuracy in Logistic when it is implemented via sklearn.

But the performance is not going beyond 48 to 49% when I implemented logistic regression on the same data with Pytorch.

Wondering why?

I’m not sure how you’ve implemented your logistic regression in PyTorch, but your vanilla implementation might differ from sklearn’s default setup.
There was a discussion about these “magic” numbers and default setups on Twitter, where Zachary wonders, if users know that the default logistic regression uses e.g. L2 regularization.

In that sense, could you try to adapt the code bases to be as close as possible to each other?

This is an old topic, but it was still relevant to a project I’m working on, so I’ll share what I found while investigating:

There are three main differences between typical scikit-learn and pytorch implementations of logistic regression: the regularization, the data handling, and the optimizer.

ptrblck already touched on regularization difference, namely that scikit-learn used L2 weight regularization by default, while pytorch models are unregularized by default.

The data handling doesn’t make much of a performance difference from what I’ve seen, but is worth mentioning. SKL models call fit on the entire training set, while most pytorch training loops use mini-batch gradient descent where they update on a fraction of the data at a time.

The most important factor (at least for my problem) was the difference between optimizers. SKL defaults to LBFGS, a second-order method, while pytorch models typically use Adam. I found that I was able to get better performance out of my logistic regression model by using the pytorch LBFGS optimizer. It failed to reach the same accuracy as an unregularized scikit-learn logistic regression though, so there may still be more to the story.