PyTorch Forecasting Dataloader/TimeSeriesDataSet with two differently scaled DataFrames (by unix) as Input/Output

eTuDpy · January 28, 2022, 2:02pm

I am trying to apply pytorch_forecasting.TimeSeriesDataSet from PyTorch Forecasting. My difficulty is, that I have two differently scaled & shaped DataFrames as input and output data.

Simply put, my input_df looks like this:

    unix_timestamp[ms]         value_a         value_b
0        1609455600000               2               3 
1        1609455600010               2               4
2        1609455600020               4               5
3        1609455600030               6               6
...                ...             ...             ...

Where the unix_timestamp is a running integer in milliseconds and each row represents the value of a 10ms interval.

Whereas my output_df looks like this:

    unix_timestamp[ms]    target_value
0        1609455600000               9
1        1609455660000               8
2        1609455720000               7
3        1609455780000               6
...                ...             ...

In this case, each row represents the value of a 1-minute interval!

Now I would like to use a time window of 10 minutes from the input_df (so 600000 ms and therefore 60000 rows) to predict 1 minute of the output_df (therefore 1 row).

How do I use pytorch_forecasting.TimeSeriesDataSet to prepare these two DataFrames this way?

Important Note I:
The unix_timestamp of the two DataFrames do not necessarily overlap like shown it the example above. So for instance, if the input_df has a timestamp of 1601596805783 which corresponds to the ‘2020-10-02T00:00:05.783’, it does not mean that this exact timestamp exists in the output_df. It might be very close, but mostly not on point to the exact millisecond and off by a couple of milliseconds.

Important Note II:
I thought about just scaling up the output_df to the same scale by repeating the value within the affiliated time interval, however, as far as I can judge this should distort the prediction result, shouldn´t it?

nivek · February 1, 2022, 9:27pm

Since the pytorch_forecasting library is not maintained by the PyTorch team, you may get a better response if you ask this question in their repository/forum.

DataLoader does allow you to pass in a custom batch_sampler, which will allow you to specify how your sampling process work and get 10 minutes of input data at a time. You can find more details on this page torch.utils.data — PyTorch 2.1 documentation.