Handling a priori on covariate variables for RNN

Eyzzle · April 25, 2025, 1:09pm

I am currently building my own dataloader. The objective is to perform time series forecasting.

Note: I could not use the builtin TimeSeriesDataSet from pytorch-forecasting due to the nature of my dataset.

As an exemple, let’s assume I am forecasting weather, using the following dataframe:

X = pd.DataFrame(data={
    'temperature': np.random.random((1, 10)).ravel(),
    'pressure': np.random.random((1, 10)).ravel(),
    'humidity': np.random.random((1, 10)).ravel(),
})

print(X.to_markdown())

	temperature	pressure	humidity
0	0.501873	0.741631	0.500776
1	0.639229	0.716319	0.846043
2	0.305061	0.78736	0.2809
3	0.666592	0.241905	0.534717
4	0.29799	0.758383	0.217077
5	0.398248	0.537553	0.524409
6	0.0699319	0.706717	0.74684
7	0.707643	0.821382	0.29689
8	0.620412	0.788375	0.512174
9	0.0802374	0.804594	0.231062

I want to predict the temperature at t+1 using the features at t-7, t-6, …, t.
Now in addition to that, let’s assume I have an a priori on the pressure data: I know it is relevant only for the past 2 days before the prediction (I only need the pressure at time t, t-1, t-2). Therefore, I do not want to add values of pressure prior to this because it will act as noise for the model. Additionally, my dataset is rather small which is why I want my data to be as useful as possible.

Since an RNN expects a dimension ( batch_size x n_timestep x feature_size ), how should I fill the values for the pressure during the time (t-7, …, t-3).
Should I do a simple backfill where the pressure value at (t-7, …, t-3) is equal to the pressure value at t-2 ?
Should I zero out the values at (t-7, …, t-3) ?

Thanks in advance!