Is there an example code for recurrent policy gradient ? Will it be simply replacing MLP with RNN ?
I’ve got examples of recurrent policy gradients here in newly made repo for a3c continuous action spaces:
Can also see older discrete action spaces for Atari repo: