I can't overfit on a simple batch with LSTM applied to Time Series data

Yasser · April 21, 2020, 10:08am

Hello ,

I am new to PyTorch, and I built an LSTM model with Embeddings to predict a target of size 720 using time series data with a sequence of length 14 and more than 18 000 features (which includes date related data).
The problem is that my model always outputs the average of all the labels he saw during training, so to make it sure it’s true, I tried to overfit my model on a single batch of size 5 and the results confirm what I thought:

The two screenshots above show the predictions of the model regarding the 86th component of the five targets of the batch, and the same for the 57th one, as you can see it’s nearly the average that’s predicted.

This is the code of the model :

This is how I made the training :

1 - I don’t know what I am doing wrong since the model can’t predict exactly for each input, but always gives the output of the targets in the training set.
The L1 Loss doesn’t decrease to 0 obviously but get stuck around 0,98 for the training data

2- I would also like to know If the Embeddings are very well used in this case, because I tried to encode the two variables related to datetime (week and day) as you can see in the code above, and concatenated them with the remaining features to feed them after to the LSTM.
In the first Embedding for the weeks, I go from a vocabulary of size 52 to an embedded vector of size 28.
In the second one for the days, I go from a vocabulary of size 7 to an embedded vector of size 4.

If there’s anything wrong in the code, please don’t hesitate to mention it , maybe that’s the reason my model can’t overfit, and learn very well.

NB : For the input that I gave to the model, it’s a sequence of length 14 which represent 14 successive days where I give to the model all the features WITH their corresponding target in the input, and for the corresponding output it’s the target of the day AFTER,

Thank you very much !

googlebot · April 21, 2020, 3:16pm

You’re trying to do classification with a distance based loss. You’d need to output label scores (…,output_size, num_classes), treat them as probabilities (softmax) and use appropriate loss like CrossEntropyLoss. Or do something else, but don’t treat labels as floating point numbers.

NB date embeddings are probably fine, you could also encode them with fourier series coefficients or one-hot encoding, avoiding some backprop.

Yasser · April 21, 2020, 3:30pm

Thank you for your answer,

Actually the label I am trying to predict is a vector of size 720 and each component each represents a “Count”, so it varies from 0 to +infinity, it’s a positive integer value.
Therefore, I think it’s a regression problem or maybe I am totally wrong but I would need more clarification if possible.
I didn’t understand what you meant by this sentence :

You’d need to output label scores (…,output_size, num_classes)

I tried to use L1Loss and MSELoss, but both of them give the same problem.
Thank you for the Embedding part, for the moment I will keep it like this and try to solve the problem of overfitting first.

googlebot · April 21, 2020, 3:43pm

Ok, then try using nn.PoissonNLLLoss

Yasser · April 23, 2020, 3:08pm

Hello,

It didn’t work the error wasn’t even decreasing.
Thank you for your help !

googlebot · April 24, 2020, 6:17pm

Well, I’m pretty sure poisson lpdf is the thing to use here. If “overfit” mode doesn’t work, some reasons are:
1)lr is too low, try tuning it or using Adadelta
2)LSTM gates are saturated - 18k inputs is kinda extreme, so you’d need normalization, “compressing” layers and/or other strategies to create good rnn inputs
3)with hidden_size:output_size ratio 1:1 LSTM “memory” is heavily contested by column predictions - some columns may be underfit as a result; to isolate this issue, try optimizing single columns (this issue wouldn’t prevent loss from decreasing though)