This is in the context of model-based reinforcement learning. Say I have some reward at time T, and i want to do truncated backprop through the network roll out, what is the best way to do this? Are there any good examples out there? I haven’t managed to find much.
I followed the pseudocode for the non-truncated BPTT in this conversation, the network trains but I have the feeling that the gradient is not flowing through time. I posted my training code for the network.