How to implement a structure like GRU?

I have a model A which is pretrained. This model A will take x_{t-1} and p_{t} to predict x_{t}. Since it is actually a PyTorch neural network it is differentiable.
What I want to do is I want this model to move along P ={p_i} (a sequence predicted from last layer) and predict X = {x_i}.
However, to predict x_{t} this model needs to take the output of the last frame x_{t-1}. Thus, I implemented this structure using a for loop which makes the process quite slow…

Is there a way to accelerate the process?