Why is my training error going up?

the model predicts the next value (scalar), not an entire sequence