Sequence ordering with TimeSeriesDataSet.to_dataloader() from PyTorch Forecasting

pcc · November 4, 2021, 3:50pm

PyTorch-Forecasting version: 0.9.0
PyTorch version: 1.9.0
Python version: 3.9.6
Operating System: Windows 10

I’ve been using pytorch forecasting and trying to understand it’s internal function. I’ve runned into something that I find weird regarding the TimeSeriesDataSet class.

When using the TimeSeriesDataSet.to_dataloader() to transform it into a dataloader to train the model, the values for the categorical variables and continuous variables don’t seem to be ordered into a sequence.

Actual behavior

For example, I get the following for the encoder_cat:
‘encoder_cat’: tensor([[[ 0, 1, 7],
[ 0, 2, 7],
[ 0, 13, 7],
[ 0, 18, 7],
[ 0, 19, 7],
[ 0, 20, 7],
[ 0, 21, 7],
[ 0, 22, 7],
[ 0, 23, 7],
[ 0, 24, 7],
[ 0, 3, 7],
[ 0, 4, 7],
[ 0, 5, 7],
[ 0, 6, 7],
[ 0, 7, 7],
[ 0, 8, 7],
[ 0, 9, 7]], …

The first column represents the city group in the data, the second one is the hour and the last one is the day of week.

Expected behavior

I was expecting to have something like this instead:
‘encoder_cat’: tensor([[[ 0, 1, 7],
[ 0, 2, 7],
[ 0, 3, 7],
[ 0, 4, 7],
[ 0, 5, 7],
[ 0, 6, 7],
[ 0, 7, 7],
[ 0, 8, 7],
[ 0, 9, 7],
[ 0, 10, 7],
[ 0, 11, 7],
[ 0, 12, 7],
[ 0, 13, 7],
[ 0, 14, 7],
[ 0, 15, 7],
[ 0, 16, 7],
[ 0, 17, 7]],

Questions

Based on this, can I assume that the observations are not fed as a sequence into the model? Can you elaborate on the reason why would that approach be desirable specifically when modelling Time Series? I did read that there is a stateful and stateless approach for timeseries forecasting using neural networks. Can I get a little more explanation on this?

Also, the same shuffling happens in the validation set and it might make sense, but in my opinion, it doesn’t since in production, I will feed an ordered sequence of data to my model in order to get the X following predictions.

I would greatly appreciate if someone can explain to me the internal function and the use of shuffled sequence or the reason why the categorical variables are stored in the wrong order and how it might affect the model in production?

Thank you