What should I do/try if my regression Model stuck at high loss value?

I’m using neural nets in my projects. It’s a regression problem where i have 3 features and I’m trying to predict one continuous value. I noticed that my neural net start learning good but after 10 epochs it get stuck on a high loss value and could not improve anymore. I tried to use adam and other adaptive optimizers instead of SGD but that didn’t work. I tried a complex architectures like adding layers, neurons, batch normalization and other activations etc… and that also didn’t work. I tried to debug and try to find out if something is wrong with the implementation but When I use only 10 examples of the data my model learn fast so there are no errors. I start to increase the examples of the data and monitoring my model results as I increase the data examples. when I reach 3000 data examples my model start to get stuck on a high value loss. My real dataset have 40000 data, I don’t know what should I try, I almost try all things that I know for optimization but none of them worked. I would appreciate it if someone can guide me on this. I ll post my Code but maybe it is too messy to try to understand, I’m sure there is no problem with my Implementation, I’m using skorch/pytorch and some SKlearn functions:

# take all features as an Independant variable except the bearing and distance
# here when I start small the model learn good but from 3000 data points as you can see the model stuck on a high value. I mean the start loss is 15 and it start to learn good but when it reach 9 it stucks there
# and if I try to use the whole dataset for training then the loss start at 47 and start decreasing until it reach 36 and then stucks there too
X = dataset.iloc[:3000, 0:-2].reset_index(drop=True).to_numpy().astype(np.float32)

# take distance and bearing as the output values:
y = dataset.iloc[:3000, -2:].reset_index(drop=True).to_numpy().astype(np.float32)
y_bearing = y[:, 0].reshape(-1, 1)
y_distance = y[:, 1].reshape(-1, 1)

# normalize the input values
scaler = StandardScaler()
X_norm = scaler.fit_transform(X, y)

X_br_train, X_br_test, y_br_train, y_br_test = train_test_split(X_norm,
                                                                y_bearing,
                                                                test_size=0.1,
                                                                random_state=42,
                                                                shuffle=True)

X_dis_train, X_dis_test, y_dis_train, y_dis_test = train_test_split(X_norm,
                                                                    y_distance,
                                                                    test_size=0.1,
                                                                    random_state=42,
                                                                    shuffle=True)
bearing_trainset = Dataset(X_br_train, y_br_train)
bearing_testset = Dataset(X_br_test, y_br_test)

distance_trainset = Dataset(X_dis_train, y_dis_train)
distance_testset = Dataset(X_dis_test, y_dis_test)


def root_mse(y_true, y_pred):
    return np.sqrt(mean_squared_error(y_true, y_pred))


class RMSELoss(nn.Module):
    def __init__(self):
        super().__init__()
        self.mse = nn.MSELoss()

    def forward(self, yhat, y):
        return torch.sqrt(self.mse(yhat, y))


class AED(nn.Module):
    """custom average euclidean distance loss"""
    def __init__(self):
        super().__init__()

    def forward(self, yhat, y):
        return torch.dist(yhat, y)


def train(on_target,
          hidden_units,
          batch_size,
          epochs,
          optimizer,
          lr,
          regularisation_factor,
          train_shuffle):

    network = None
    trainset = distance_trainset if on_target.lower() == 'distance' else bearing_trainset
    testset = distance_testset if on_target.lower() == 'distance' else bearing_testset
    print(f"shape of trainset.X = {trainset.X.shape}, shape of trainset.y = {trainset.y.shape}")
    print(f"shape of testset.X = {testset.X.shape}, shape of testset.y = {testset.y.shape}")

    mse = EpochScoring(scoring=mean_squared_error, lower_is_better=True, name='MSE')
    r2 = EpochScoring(scoring=r2_score, lower_is_better=False, name='R2')
    rmse = EpochScoring(scoring=make_scorer(root_mse), lower_is_better=True, name='RMSE')

    checkpoint = Checkpoint(dirname=f'results/{on_target}/checkpoints')
    train_end_checkpoint = TrainEndCheckpoint(dirname=f'results/{on_target}/checkpoints')

    if on_target.lower() == 'bearing':
        network = BearingNetwork(n_features=X_norm.shape[1],
                                 n_hidden=hidden_units,
                                 n_out=y_distance.shape[1])

    elif on_target.lower() == 'distance':
        network = DistanceNetwork(n_features=X_norm.shape[1],
                                  n_hidden=hidden_units,
                                  n_out=1)

    model = NeuralNetRegressor(
        module=network,
        criterion=RMSELoss,
        device='cpu',
        batch_size=batch_size,
        lr=lr,
        optimizer=optim.Adam if optimizer.lower() == 'adam' else optim.SGD,
        #optimizer__momentum=0.9,
        optimizer__weight_decay=regularisation_factor,
        max_epochs=epochs,
        iterator_train__shuffle=train_shuffle,
        # iterator_train__pin_memory=True,
        # iterator_train__num_workers=4,
        # iterator_valid_shuffle=True,
        # iterator_valid__pin_memory=True,
        # iterator_valid_num_workers=4,

        train_split=predefined_split(testset),
        callbacks=[mse, r2, rmse, checkpoint, train_end_checkpoint]
    )

    print(f"{'*' * 10} start training the {on_target} model {'*' * 10}")
    history = model.fit(trainset, y=None)

    print(f"{'*' * 10} End Training the {on_target} Model {'*' * 10}")


if __name__ == '__main__':

    args = parser.parse_args()

    train(on_target=args.on_target,
          hidden_units=args.hidden_units,
          batch_size=args.batch_size,
          epochs=args.epochs,
          optimizer=args.optimizer,
          lr=args.learning_rate,
          regularisation_factor=args.regularisation_lambda,
          train_shuffle=args.shuffle)

and this is my network declaration:

class DistanceNetwork(nn.Module):
    """separate NN for predicting distance"""
    def __init__(self, n_features=5, n_hidden=16, n_out=1):
        super().__init__()
        self.model = nn.Sequential(

            nn.Linear(n_features, n_hidden),
            nn.LeakyReLU(),
            nn.Linear(n_hidden, 5),
            nn.LeakyReLU(),
            # nn.Linear(n_hidden, n_hidden),
            # nn.ReLU(),
            # nn.Linear(n_hidden, n_hidden),
            # nn.ReLU(),
            nn.Linear(5, n_out)
        )

PS: I already tried to increase layers, neurons and also to try other activations, batch normalization. My data are also normalized between [-1, 1], my target value is not normalized since it is regression and I’m predicting a continuous value. I appreciate any help, I’m trying to solve this for long time now so please any suggestions will help me. thanks in advance

Hello,

Could you provide more details on your data? Like what are the 3 features you are working with (or if it is a public dataset, which dataset is it). Also, do you have plots of your training/validation loss?

Hi, No the data is not public, the data is from a vehicle and the 3 features are speed, longitudinal acceleration and lateral acceleration. from those features I should build a model to predict a continuous number. the network start to learn good in the first epochs but then it stabilized at a high loss value, it starts from a loss value of 90, it learn until the loss reaches 40 and then it stays there and start slowely to decrease and increase so if we plot it, it would show a sharp decrease at the first epochs and then it stays stuck at 40. both training and validation loss so that means that it fails to even overfit the data. I don’t know what’s the reason for this. furthermore I have tried techniques like random forest and it gave me better performance both on the train and validation although it overfitted the data but it still better than neural nets. the problem is that I must use neural nets for this task. I hope I explain to you well

Ok, have you measured the accuracy of your model? And what was the loss value that you obtained when training on less examples and when you tried decision tress? Looking at your code, everything seems fine, so my intuition is that either there is something wrong in your data or neural networks are not the appropriate model for your problem. Did you try a simple linear regression?

thanks for the answer, what do you mean by accuracy ? I’m building a regression model so accuracy doesn’t make sense in my case, if you mean something like R2 score then yes my neural network shows a value of 0.15 but surprisingly random forest gave me a 0.9 score on the training data but 0.3 score on the validation data which is a lot better than the neural net (that confused me more). Now according your suggestion to use linear regression, would that make sense since my data have strong non linearity? I tried it and it wasn’t able to fit the data, it gave a 0.005 score and a high loss, I also tried SVR but it didn’t work well, the only machine learning approach that worked well(well not good but better than the others until now) is random forest regressor but as I said it overfits the data and that’s also bad, I also tried cross validation but that didn’t work, it alway overfits no matter what I do. Now coming to your suggestion whether something is wrong with the data, how can I know that? or what does it exactly mean? I think a neural network should be able to map any non linearity, maybe not a perfect fit but at least a good fit not like in my case

Ahh, my bad for accuracy (I mostly work with classification problems), so yes R2 score in that case! Yes, in case of non linearity in your data, linear regression does not make much sense . What I mean by problem with your data is maybe in the normalization or the nature of the problem. Are there other methods that work with the same dataset that you use? Also, did you try your network with only one of the three features?

What bugs me is that your network seems to learn something (as the loss decreases and then is constant for multiple epochs), but it is just not able to minimize the loss more than 40, which means that the network somewhat reached a minimum.

the normalization that I used is z_score but I also tried min max scaling, both gave the same results! the project is a research project so there is not much resources about it, if any. the last thing that I can think of is using LSTM but somehow I lost hope using NNs since random forest gave better results, however I lost the whole day trying to reduce the overfit of the random forest regressor but I didn’t achieve that. As you suggested I already tried fitting with one feature but that didn’t worked, in fact sometimes it gave worse performance.

Yes my Net learn in the first epochs and then stops working somehow, I’d like to think about it that it is struggling to get out or rather break that loss value. I tried cyclic LR, LR Scheduler but it also didn’t work. I’m run out of Ideas :confused:

I am afraid that I am not very useful to your problem, you seem to have tried a lot of things, and clearly I am out of my depth here. If you tried different architectures and the same problem persists, could it be possible that you cannot learn from your dataset? I wish you the best of lucks!

thanks anyway :slight_smile:, NN can actually map every input/output regarding to the nature of data or the randomness/noise in the data (at least that’s what I thought) I already did some projects I’m not new to NNs and that’s what bothers me more because I can’t figure out what is wrong with my mode. I thank you for your disscussion and hopefully someone will see this and may help me further