What is the purpose of a mask in time series forecasting?

Why do we train with the mask and then make predictions with it during inference (the ladder idea is clearer). Why can’t we train the model and create a mask such that it extends an initial period?

Ex:

# mask would be the same dimensions except a different length, and filled with 0
torch.cat(previousTimeTensor, mask)