Hello pytorch community,

I have a question about handling “teacher forcing” in seq2seq while also applying a sliding window to improve my prediction.

My seq2seq consists of an encoder and a decoder and for future time steps, I am predicting in an autoregressive fashion with a sliding window of 3 (so with a sliding window from t-3 to t I am predicting the step time t). Here, I am also randomly applying (with decreasing probability over time) teacher forcing.

When teacher forcing is applied, which timesteps should be replaced for the next prediction steps (t+1)? Just “t” or also “t-1” and “t-2”?

Example: Window_lenght is 3 with steps 20, 21, 22 as input I for the step 22 there will be teacher forcing. For prediction step 23 should teacher forcing replace just the step 22 or also 20, 21 and 22?

Any hint or help would be greatly appreciated!