Implementing Truncated Backpropagation Through Time

Thank you for the code! I tried implementing this solution for an LSTM architecture but I can’t get the gradient to backpropagate through the hidden states. I give more details about the problem in this question, any help would be really appreciated!