shuffle=True makes the weird result

Yu_Syuan_Sean_Lin · June 1, 2020, 1:36pm

I am trying to make a prediction on stock data with a simple LSTM model.
I spilt first 80% data as train set, and last 20% data as test set.

The data shape is like ( batch_size , seq_len , feature_dims)

here is my code for data loader

train_loader=Data.DataLoader(dataset=trainDataset,batch_size=BATCH_SIZE,shuffle=True)

when I set shuffle = false, everything work fine. (Left pics)

However, when I change shuffle to False, the prediction value become weird.(Right pics),the output value will stuck in a very small range.

Can anyone give me any suggestion? Thanks!

ptrblck · June 2, 2020, 6:41am

If shuffling interferes negatively with your training, it seems that your training routine or model depends on the sequential input of the data.
E.g. if you are using an RNN-like model, you might pass the batches sequentially and reusing the hidden states. In that case, shuffling the data might destroy the temporal correspondence.
You could try to shuffle “segments” of the data, but it depends of course on your actual use case.