I was using a Transformer Encoder for predicting future values of a time series. As I wanted the transformer to take into account 200 past values and predicting the next 50, I was using a mask of (250x250) and the last 50 values of each row were -inf for hiding the future values. However, now I realized that when I put zeros on the last 50 values the Transformer crashes. I tried with the complete transformer (encoder + decoder) and the same thing happens. Does anyone know what can be happening or if I am doing something wrong?
Thanks in advanced to everyone