Hello,
I’m working on a non-linear problem using a simple Neural Network model, and I’m encountering a strange issue with my predictions. The model is designed to predict a single output parameter (delta_sigma
) based on two input parameters (sigma_t
and delta_epsilon
). While training, I’ve normalized the data and tested different hyperparameters, but I’m still getting constant prediction values, no matter the inputs.
Problem Description:
- The model appears to learn initially, as I see the loss function decrease over time. Both the training loss and validation loss decrease significantly, which seems normal.
- However, after some time, the predictions stabilize and remain constant (even though I change the input values). For example, it’s usually around a value like
1000
, but sometimes it fluctuates. But right now, the output is just constant. - I’ve tried experimenting with different hyperparameters, like changing the learning rate or network architecture, but it hasn’t resolved the issue. The model keeps outputting the same value for every input, regardless of changes.
Steps I’ve Taken:
- I started with a simple code and gradually added more complexity to explore potential causes, such as overfitting or issues with normalization.
- I’ve also seen posts mentioning data processing issues as potential causes of this problem, but I can’t seem to pinpoint what might be wrong with my data processing pipeline.
- The dataset is small, and even though the loss looks good (both value loss and overall loss), the final predictions are all identical.
Request:
- I’d appreciate any help in identifying what might be going wrong.
- I’m attaching a link to my GitHub repository where you can find the essential details in the README (the notebook specifically explains the process and shows what happens in the last iteration).
- The files themselves aren’t necessary to look at; the README should be enough to understand the issue.
Important Code Blocks:
Here are some key pieces of the code that might help in understanding the issue:
Model Definition:
class LSTMModel(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, output_size):
super(LSTMModel, self).__init__()
self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)
self.relu = nn.ReLU()
def forward(self, x):
_, (hn, _) = self.lstm(x)
out = self.fc(hn[-1])
return out
This LSTM-based architecture is simple and designed to predict delta_sigma
from the inputs sigma_t
and delta_epsilon
.
Data Normalization:
def min_max_normalize(tensor):
"""
Normalizes the tensor using global min_val and max_val.
"""
global min_val, max_val
if min_val is None or max_val is None:
min_val = torch.min(tensor)
max_val = torch.max(tensor)
return (tensor - min_val) / (max_val - min_val)
Data normalization ensures that all inputs are on the same scale. However, the output still seems to stay constant despite this normalization.
Training Loop with Early Stopping:
best_val_loss = float('inf')
patience = 20
patience_counter = 0
epochs = 10000
for epoch in range(epochs):
model.train()
running_loss = 0.0
for i, (x_batch, y_batch) in enumerate(train_dataloader):
optimizer.zero_grad()
outputs = model(x_batch)
loss = criterion(outputs, y_batch)
loss.backward()
optimizer.step()
running_loss += loss.item()
avg_train_loss = running_loss / len(train_dataloader)
if improvement_block:
model.eval()
val_loss = 0.0
with torch.no_grad():
for x_batch_val, y_batch_val in val_dataloader:
val_outputs = model(x_batch_val)
loss = criterion(val_outputs, y_batch_val)
val_loss += loss.item()
avg_val_loss = val_loss / len(val_dataloader)
if avg_val_loss < best_val_loss:
best_val_loss = avg_val_loss
patience_counter = 0
else:
patience_counter += 1
if patience_counter >= patience:
print(f"Early stopping after {epoch + 1} epochs due to no improvement.")
break
if (epoch + 1) % 10 == 0:
print(f'Epoch {epoch+1}/{epochs}, Train Loss: {avg_train_loss:.6f}, Validation Loss: {avg_val_loss:.6f}')
This loop includes early stopping based on validation loss, but even with this setup, I end up with constant predictions after training.
Prediction Block:
predicted_delta_sigma, true_delta_sigma = predict_oedometer(
model,
example_sigma_t_input,
example_delta_epsilon_input,
min_val,
max_val,
normalize=normalize
)
Here, I’m making predictions with the trained model. Despite a well-decreasing loss, the predictions end up being constant.
Conclusion:
I’m still relatively new to the field and am learning together with my professors, but I’ve hit a wall. I would be really grateful for any insights or suggestions on where I might be going wrong.
Thanks in advance for your help!