Pytorch LSTM and offsetting data for time-series models

Hello folks.
I was looking at an implementation of the DeepAR model for time-series prediction. The model uses an LSTM and takes in 168 hours of data to predict the next 24 hours of data–in other words training on 7 days of data to predict the 8th day. They also stride the time series by 1 day or 24 hours, so each window is 192 (168 + 24) timesteps long, but incremented by a rolling window of 24 hours.

Now the one confusing thing I found was that in setting up the data array for training. The folks who created the repo setup the input array as a tensor with the first dimension as the number of time-series window, the second dimension as sequence length, and the third dimension as some additional covariates, so [number_of_windows, 192, number_of_covariates]. HOWEVER, and here is the tricky thing, the first value in every time series window is begins with a 0 and only runs for 191 steps. for example windows look like.

window      | sequence                   | covariates    |
    1       | 0, 1, 2, 3, 4...191        | 1, 9, 1       |
    2       | 0, 5, 6, 8, 9, ...         | 1, 9, 2       |
    3       | 0, 10, 3, 8, 12, ...       | 1, 9, 3       |

I asked the developers in a github issue, and they said the following:

At every time step, z_t is given and the goal is to predict z_{t+1}. In this way, if the window size is 192, we actually need 193 timesteps to cover both given and labels. Therefore, we let the first timestep in the window be 0. There are other setups too.

So this makes sense, but I have not seen this setup mentioned anywhere before. So is this the right way to setup this kind of prediction model where z_t is predicting z_{t+1}, or is there a more consistent and common way to do this. Is there any documentation about this kind of setup. Any help is appreciated.