LSTM forecasting model converges on persistence algorithm

We are currently working with pytorch on an LSTM model for forecasting. However, we find that our model always seems to converge on the persistence algorithm whenever we train. Essentially, this means that the model uses the previous time step (t-1) to predict the expected outcome at the next time step (t). Does anyone know how we can prevent this from occurring?

Our current LSTM architecture is as follows:
Input size: 2
Hidden dim: 500
Output size: 2
Num layers: 1
LSTM dropout: 0.05

And we have an FCL at the end with a 0.05 dropout. We appreciate any help! Thanks!