LSTM with sequences - Online Training copy true values with 1 day lag

Duonijo · November 22, 2023, 9:47am

Hello everyone,

I’m encountering issues while working on an LSTM implementation with online training. There are two primary problems:

Overfitting during Online Training:

I can train my model successfully with satisfactory results. However, when simulating online training over a year, the model exhibits significant overfitting, with predicted values (y_pred) mirroring true values (y_true) but with a one-day lag.

Spikes in Predictions:

Occasionally, the LSTM produces spikes in predictions. I’ve explored batch normalization as a solution, and while implementing it during the training loop is effective, it fails during online training due to feeding the network with a batch of length 1 at a time.

Here’s an overview of my code structure:

I have implemented the code as a script with various functions such as split_sequence and online_training.
The architecture involves creating sequences from the input data. For instance, if X has a shape of (2000, 14) with 14 columns/features, applying the sequence function results in a shape of (2000, 7, 14) with 7 as the lookback value.

def split_sequences(input_sequences, output_sequence, n_steps_in, n_steps_out):
    X, y = list(), list()  # instantiate X and y
    for i in range(len(input_sequences)):
        # find the end of the input, output sequence
        end_ix = i + n_steps_in
        out_end_ix = end_ix + n_steps_out - 1
        # check if we are beyond the dataset
        if out_end_ix > len(input_sequences): break
        # gather input and output of the pattern
        seq_x, seq_y = input_sequences[i:end_ix], output_sequence[end_ix - 1:out_end_ix, -1]
        X.append(seq_x), y.append(seq_y)
    return np.array(X), np.array(y)

The data is then split into training, validation, and test tensors, where the test set (X_test) consists of 365 days, and the remaining data is split into an 80/20 ratio for training and validation tensors.
The initial training involves training the model on X_train with early stopping based on the validation set X_val.
To simulate real-world usage, the update_loop function runs the model every day, forecasts for the next 7 days, retrains the model on new data, and repeats the process.

Despite setting h_prev and c_prev to None at every step and trying different sequence lengths(even 1 to hide previous days data), the issue persists. Additionally, modifying the loss computation to focus only on the initial values of y_true and y_pred did not yield any improvement.

I’m seeking guidance on resolving the one-day lag issue and understanding why the model is consistently producing almost correct outputs during online training, even with a poorly trained initial model.

Additionally, interested individuals can find the test on my GitHub repository. It’s worth noting that while the dataset provided in the test is not the private dataset I’m currently working with, the same issue arises with the randomly generated dataset used for testing purposes.

You can find the code and illustration on this repository to test my problem :

By default it s torch for cpu in requirement.txt do not hesitate to reinstall in your environment torch with cuda to accelerate it.

vdw · November 22, 2023, 11:23am

I’m not quite sure what you mean by “online training”? How exactly does your batch look like? When you give the shapes – (2000, 14) and (2000, 7, 14) – what are their meanings? I assume 2000 is the batch size, 7 is the sequence lengths, and 14 is the number or features?

Duonijo · November 22, 2023, 12:52pm

Hi,
Sorry i did rewrite my text with better syntax and forgot to rewrite the meaning of 2000.

It s my initial dataset length.

So i do have X of length 2000
And when i do the split i end up with X_test length 365 (last 365 elements of X) and the rest is splitted to X_train and X_val as 80/20.

So I have 3 tensors like X_train(1308, 7, 14), X_val(327, 7 14), X_test(365, 7, 14).
7 is my sequence_length and 14 number of features

By online training i Mean, i do have a initial training over X_train, then i want to update my model weights without training from scratch every day with new input data only.
The good term is “online learning” i think.

That’s what i called online training, when i update my model with lower learning_rate.

Duonijo · December 12, 2023, 8:30am

Hey, just in case, I did work on another thing.
I am right back on this project with the same strang behavior.

If anyone has an idea.

Thanks you