In some of my recent projects I have found that when training my model, the loss decreases very slowly, no matter the size of the dataset.
And by slow I mean falling by 1e-3 every 10 epochs slow.
I try many things like playing with the number of layers and the structure of the architecture and sometimes one of these experiments helps but it hasn’t resulted in how I believe the loss should be falling. When I let it train for long periods I get a plot like the one attached here, (x-epoch, y-mean loss for that epoch):
I have tried playing with the features, including different scalers and feature engineering. In the image above, I am using a MinMaxScaler as when I tested the features in a linear regression, all the p-values were < 0.05 with R_sq of 98%, so I don’t think it is the features here.
In the past I have been told to see if I can overfit a single batch. And when I train on a single batch it does train properly, but when I use the entire training dataset, this occurs.
I would really appreciate some help with this. I have added some relevant code below:
I am using Adam optimizer and MSE Loss function as I am trying to predict a single continuous value.
Thank you in advance!