How to backpropagate a loss through time-series RNN?

If trying to backpropagate based on a reward function on a time series problem, you should look at reinforcement learning with DQN or PPO. Such a setup should store the states, actions and rewards, and then recursively train the model with the data before repeating(getting new states, actions and rewards).

Here is an example of a DQN:
https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html