LSTM - question regarding hidden states

Generally you would not need to backprop through hidden states. Detaching hidden states from the graph reduces autograd’s memory and time consumption. Take a look at Time/Memory keeps increasing at every iteration

1 Like