return((1/Y.size())*pred_loss + lamb*reg)
def fit(lamb,X_pt,Y_pt,w,epochs = 5000, learning_rate = 0.1):
opt=torch.optim.Adam([w_pt], lr=learning_rate, betas=(0.9, 0.99), eps=1e-08, weight_decay=0, amsgrad=False)
for epoch in range(epochs):
pred = torch.matmul(X_pt,w_pt)
loss = ridge_loss(Y_pt,pred,w_pt,lamb)
X_pt=torch.from_numpy(X) # xtrain
i am new to pytorch . i want to learn how to use custom loss functions in pytorch and in order to get started i wanted implement ridge regression and i find that my error values are very high than the sklearns implementation of ridge regression. can you please help me find out the mistake in writing my loss function .
Sklearn most likely is not using first-order gradient descent to solve this. I can’t spot an error in your code, so maybe you just need to add lr decay (scheduler) - in general you should check if your loss decreases at a reasonable pace. Another possible issue is non-normalized data (i.e. epoch 0 prediction is too far off).
@googlebot . thanks for replying .
- i will implement a scheduler .
- i tried printing my loss while the gradient descent is running , it seems to initially fall down and then it stays constant at not so low value without any change
- my X is 0 mean unit variance (unit normal distribution) so i think scaling shouldnt be an issue . please let me know if i understood this wrong . what do you mean by epoch 0 prediction is too far off?
I think your 1/Y.size() term is incorrect, you’re overemphasizing L2 penalty.
e.g. if true_y = x * 100 + b, but your w initialization range is like -3…3 (and you don’t model bias at all). Accelerating optimizers help here, but that may be not enough for harder problems and mini-batches.
@googlebot i have figured out a bug in my error metric function and hence it was showing higher error rate . the method seems to converge without any need of scheduling the learning rate . thanks a lot for your help