Constant Predictions in Non-Linear Model Despite Training Progress

Hello,

I’m working on a non-linear problem using a simple Neural Network model, and I’m encountering a strange issue with my predictions. The model is designed to predict a single output parameter (delta_sigma) based on two input parameters (sigma_t and delta_epsilon). While training, I’ve normalized the data and tested different hyperparameters, but I’m still getting constant prediction values, no matter the inputs.

Problem Description:

  • The model appears to learn initially, as I see the loss function decrease over time. Both the training loss and validation loss decrease significantly, which seems normal.
  • However, after some time, the predictions stabilize and remain constant (even though I change the input values). For example, it’s usually around a value like 1000, but sometimes it fluctuates. But right now, the output is just constant.
  • I’ve tried experimenting with different hyperparameters, like changing the learning rate or network architecture, but it hasn’t resolved the issue. The model keeps outputting the same value for every input, regardless of changes.

Steps I’ve Taken:

  • I started with a simple code and gradually added more complexity to explore potential causes, such as overfitting or issues with normalization.
  • I’ve also seen posts mentioning data processing issues as potential causes of this problem, but I can’t seem to pinpoint what might be wrong with my data processing pipeline.
  • The dataset is small, and even though the loss looks good (both value loss and overall loss), the final predictions are all identical.

Request:

  • I’d appreciate any help in identifying what might be going wrong.
  • I’m attaching a link to my GitHub repository where you can find the essential details in the README (the notebook specifically explains the process and shows what happens in the last iteration).
  • The files themselves aren’t necessary to look at; the README should be enough to understand the issue.

Important Code Blocks:

Here are some key pieces of the code that might help in understanding the issue:

Model Definition:

class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size):
        super(LSTMModel, self).__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)
        self.relu = nn.ReLU()

    def forward(self, x):
        _, (hn, _) = self.lstm(x)
        out = self.fc(hn[-1])
        return out

This LSTM-based architecture is simple and designed to predict delta_sigma from the inputs sigma_t and delta_epsilon.

Data Normalization:

def min_max_normalize(tensor):
    """
    Normalizes the tensor using global min_val and max_val.
    """
    global min_val, max_val
    if min_val is None or max_val is None:
        min_val = torch.min(tensor)
        max_val = torch.max(tensor)
    return (tensor - min_val) / (max_val - min_val)

Data normalization ensures that all inputs are on the same scale. However, the output still seems to stay constant despite this normalization.

Training Loop with Early Stopping:

best_val_loss = float('inf')
patience = 20
patience_counter = 0
epochs = 10000

for epoch in range(epochs):
    model.train()
    running_loss = 0.0

    for i, (x_batch, y_batch) in enumerate(train_dataloader):
        optimizer.zero_grad()
        outputs = model(x_batch)
        loss = criterion(outputs, y_batch)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()

    avg_train_loss = running_loss / len(train_dataloader)

    if improvement_block:
        model.eval()
        val_loss = 0.0
        with torch.no_grad():
            for x_batch_val, y_batch_val in val_dataloader:
                val_outputs = model(x_batch_val)
                loss = criterion(val_outputs, y_batch_val)
                val_loss += loss.item()

        avg_val_loss = val_loss / len(val_dataloader)

        if avg_val_loss < best_val_loss:
            best_val_loss = avg_val_loss
            patience_counter = 0
        else:
            patience_counter += 1

        if patience_counter >= patience:
            print(f"Early stopping after {epoch + 1} epochs due to no improvement.")
            break

    if (epoch + 1) % 10 == 0:
        print(f'Epoch {epoch+1}/{epochs}, Train Loss: {avg_train_loss:.6f}, Validation Loss: {avg_val_loss:.6f}')

This loop includes early stopping based on validation loss, but even with this setup, I end up with constant predictions after training.

Prediction Block:

predicted_delta_sigma, true_delta_sigma = predict_oedometer(
    model,
    example_sigma_t_input,
    example_delta_epsilon_input,
    min_val,
    max_val,
    normalize=normalize
)

Here, I’m making predictions with the trained model. Despite a well-decreasing loss, the predictions end up being constant.

Conclusion:

I’m still relatively new to the field and am learning together with my professors, but I’ve hit a wall. I would be really grateful for any insights or suggestions on where I might be going wrong.

Thanks in advance for your help!

Hi Lukas!

Looking at the “Prediction vs. Real” chart you’ve posted, it looks like your model is learning
to predict the mean of your “Real” target values (but not the actual values on a per-sample
basis).

This would explain your sensibly falling loss as you train – the loss gets smaller as your
model better learns the mean value in question.

I would suggest trying to get your model to overfit. Take, for example, a single batch of five
or ten samples, and train repeatedly (perhaps for quite a long time) with that one batch. Does
your model learn to predict the correct target values just for the batch? (Don’t expect such an
overfit model to “generalize” to the rest of your dataset.) If you can get your model to overfit,
can you get it to overfit on a larger, but still small set of samples?

If you can get your model to overfit, it is able to learn to predict per-sample values, rather than
just the overall mean. I would then try training on your full training set for a much longer time.

It is plausible that it’s easy for your model to learn the overall mean, but harder to learn the
per-sample values. Learning the mean could give you a nicely falling loss curve that then
plateaus, and you simply might not be training (anywhere near long) enough to see the “real,”
more difficult per-sample learning start to kick in.

You say that you normalize your data, but from the looks of your chart, it appears that you are
not normalizing your target values to have mean zero. I would recommend normalizing your
target values to mean zero and standard deviation one. Now a randomly-initialized model will
start out making predictions whose mean value is about zero, so your model can focus on
learning the sample-to-sample differences (rather than just the mean).

If you can’t get your model to overfit a (very) small dataset, then you should check your code
carefully for bugs. (You might consider whether there is some peculiarity in your architecture
that impedes training, although this seems less likely.)

For example, if your model were somehow disconnected in the middle so that the inputs never
made it to the output, the bias in the final layer could, nonetheless, learn to predict the overall
mean of the target values.

Best.

K. Frank

1 Like

Hello Frank,
thank you for your detailed analysis and time to check out everything! I will try to fix my code with your suggestions. I will give you an update how it went.

Thank you very much the support, have a great day!

  • Lukas