Hi! I am trying to implement a pytorch-based Lasso regression but could not confirm the sparsity of the result weight matrix.
"Lasso for compressing dictionary"
def __init__(self, input_size):
self.linear = nn.Linear(input_size, 1, bias=False)
def forward(self, x):
out = self.linear(x)
def lasso(x, y, lr=0.005, max_iter=2000, tol=1e-4, opt='sgd'):
# x = x.detach()
# y = y.detach()
lso = Lasso(x.shape)
criterion = nn.MSELoss(reduction='sum')
if opt == 'adam':
optimizer = optim.Adam(lso.parameters(), lr=lr)
elif opt == 'adagrad':
optimizer = optim.Adagrad(lso.parameters(), lr=lr)
optimizer = optim.SGD(lso.parameters(), lr=lr)
w_prev = torch.tensor(0.)
for it in range(max_iter):
out = lso(x)
loss = criterion(out, y)
l1_norm = 0.1 * torch.norm(lso.linear.weight, p=1)
loss += l1_norm
w = lso.linear.weight.detach()
if bool(torch.norm(w_prev - w) < tol):
w_prev = w
# if it % 100 == 0:
# print(loss.item() - loss_prev)
a = torch.randn(4, 60)
b = torch.randn(4, 1)
r = lasso(a, b, opt='adam')
l = linear_model.Lasso(alpha=0.1, fit_intercept=False)
# l.path(a, b, verbose=True)
And the printed results are:
tensor([[ 0.4202, 0.4335, 0.1444, -0.0708, -0.3143]])
[ 0.312021 -0.57850013 -0. -0. 0. ]
I am comparing my implementation with the sklearn Lasso implementation. I have found some answers on this discussion and Stack exchange as well but still having this problem. I wonder if there is anything wrong with my code?
Thank you very much!
I think the main problem you encounter is that the “lasso” problem can be optimized much better than using adam or adagrad.
And sklearn in particular, use a lot of fine tuned algorithm that will solve these problems very very well.
You can try and monitor the loss of your network, make sure your learning rate is high and you only return after you have converged. But that might take some time. Also the values will never be
0. but they should go down to something small…
Any plans to introduce these efficient lasso algorithms into pytorch?
This is definitely something we would be interested in having. But we don’t have anyone working on this at the moment.
But we would be very happy to accept contributions for this (and help with design if needed).
Thank you for your reply!
It would be great to have “simple” things like Lasso regression (or Ridge etc.) implemented. I think (my personal understanding) sklearn may have a more complete coverage of things (not only the fancy DNNs but other things as well) than pytorch. I could be wrong, because I am quite new to pytorch but I like it very much and would definitely want it to be better in the future!
Yes, A port of the basic sklearn algorithms into pytorch would be a very useful thing. And also would give for free a GPU-ready version of the sklearn algorithms.
I am indeed looking forward to it! Thank you!
Sklearn uses a coordinate descent algorithm. An exact, sparse solution is reached after a finite number of iterations. No stochasticity or SGD involved. Fairly different framework from back-propagation.
This is something I am potentially interested in contributing, but for now I’d like to understand the added value of having Lasso-like solvers in PyTorch and if they could fit with the existing pytorch optimizers.
The main motivation to make it easy to compare NN-based model with classical ML algorithms.
In particular, the more common code we can add here, the better.
Given that most of these algorithms have very different ways to go through the dataset and specific stopping criterions there was no plan to incorporate them as pytorch optimizers for which the main training loop is user-controlled.
But a similar api to sklearn that takes a
data.Dataloader) as input could be interesting. In particular that would allow to share all the data loading and preprocessing. And only the training function would be different.
Also an implementation based on Tensor would allow GPU support for free!
Do let me know if you’re interested and want to discuss this further.
FWIW, Large-Scale Machine Learning with Stochastic Gradient Descent by Bottou gives a SGD algorithm for Lasso (and other algorithms such as SVM and K-Means).
I agree with your Lasso implementation, your sparsity work could be helped with torch’s prune. The simple example is prune.l1_unstructured which set lowest units to zero based on specified amount