Generally you would not need to backprop through hidden states. Detaching hidden states from the graph reduces autograd’s memory and time consumption. Take a look at Time/Memory keeps increasing at every iteration
1 Like