Experience replay for REINFORCE

How can I implement experience replay for REINFORCE ?
I have an LSTM which after getting an input, outputs a series of actions (I use reinforce like in the link above to sample an action and give it a reward) but training it in an online fashion doesn’t seem to work well enough.
So is there some way I can use experience replay here ?

1 Like

Usually policy gradient methosd like reinforce are on-policy method which can not be updated from experience replay.


Your answer seems to suggest that on-policy methods do not use experience replay. Could you clarify?
Suppose I replace deep q-learning with SARSA (which is on-policy) and use experience replay then wouldn’t it be on-policy experience replay?