I am trying to use a simple 3 layer neural net to predict a scaler output given an input of dimension 430. For my network, I use 2 layers of dimensions 600 and 80 and I use leakyReLU non-linearities. I also tried to do the regression using ridge regression. My solution for regression is better than my pytorch implementation. I was wondering what. I have been playing around with layers, dimensions, non-linearity, learning rate but non improved the results. I was wondering if this is something to be expected and if not, what else would you suggest analyzing.
If you have a working model, e.g. sklearn.linear_model.Ridge make sure to dig a bit into the model and then you could try to reimplement it in PyTorch.
A lot of sklearn models use some regularization, which proved to work good, while these techniques are often forgotten in the custom PyTorch implementation.
Thanks @ptrblck! I fixed the regularization but the error did not get much better. I was in fact using RidgeCV from sklearn, but the idea is same. I looked at the source code (thanks, that is always a good research strategy advice) and I think that the only difference now is that the code uses SVD for finding the coefficients: we find the derivative with respect to weights and find the solution in terms of SVD decomposition matrices.
As far as I understand, backprop does that for me in a network anyways and I don’t care about how to update weights other than my loss function. I can use pytorch to implement this but if I use SVD then it won’t be a network anymore.
For interested future reader, as can be seen in Bishop’s PRML book:
The particular case of a quadratic regularizer is called ridge
regression (Hoerl and Kennard, 1970). In the context of neural
networks, this approach is known as weight decay.
So we can either set
weight_decay parameter in optimizer to a non-zero value or we can regularize the loss function with L2 as follows:
criterion = torch.nn.MSELoss() lmbd = 1e-8 # for custom L2 regularization loss = criterion(y_pred, y_train) reg_loss = None for param in model.parameters(): if reg_loss is None: reg_loss = torch.sum(param ** 2) else: reg_loss = reg_loss + param.norm(2) ** 2 loss += lmbd * reg_loss