I am trying to apply pytorch_forecasting.TimeSeriesDataSet
from PyTorch Forecasting. My difficulty is, that I have two differently scaled & shaped DataFrames as input and output data.
Simply put, my input_df
looks like this:
unix_timestamp[ms] value_a value_b
0 1609455600000 2 3
1 1609455600010 2 4
2 1609455600020 4 5
3 1609455600030 6 6
... ... ... ...
Where the unix_timestamp
is a running integer in milliseconds and each row represents the value of a 10ms interval.
Whereas my output_df
looks like this:
unix_timestamp[ms] target_value
0 1609455600000 9
1 1609455660000 8
2 1609455720000 7
3 1609455780000 6
... ... ...
In this case, each row represents the value of a 1-minute interval!
Now I would like to use a time window of 10 minutes from the input_df
(so 600000 ms and therefore 60000 rows) to predict 1 minute of the output_df
(therefore 1 row).
How do I use pytorch_forecasting.TimeSeriesDataSet
to prepare these two DataFrames this way?
Important Note I:
The unix_timestamp
of the two DataFrames do not necessarily overlap like shown it the example above. So for instance, if the input_df
has a timestamp of 1601596805783 which corresponds to the ‘2020-10-02T00:00:05.783’, it does not mean that this exact timestamp exists in the output_df
. It might be very close, but mostly not on point to the exact millisecond and off by a couple of milliseconds.
Important Note II:
I thought about just scaling up the output_df
to the same scale by repeating the value within the affiliated time interval, however, as far as I can judge this should distort the prediction result, shouldn´t it?