LSTM - question regarding hidden states

Hello everyone,

I’m implementing an LSTM for a very particular time series problem where I basically have a dynamical system that advances in time and the LSTM analyzes this and makes some predictions.

My question is that I had a bug when I started working on the code which i solved adding this line:

hidden = tuple([each.data for each in hidden])

I don’t really understand why is this needed and what this does, and cannot find anywhere a proper explanation. If you can point me to some resource I would be very grateful.

Thank you in advance.

this is preventing gradient backpropagation, like detach()

And why do we want to do that? In fact I do want to backpropagate in time to learn how the dynamics evolves…

that was an answer about what your code does, I can’t answer why you are using it

1 Like

Sure, the fact is that I see this used all over tutorials but I don’t undersand why.

If I need to translate a sentence like “I am going to the park”, I can’t do the detach after each word, right? But maybe I can do the detach at the end of the sentence, before translating a new one, assuming information on the previous sentence has nothing to do with the successive? Or do I need to do the detach after every backward() step that I do? The question is in general when should you do the detach.

Generally you would not need to backprop through hidden states. Detaching hidden states from the graph reduces autograd’s memory and time consumption. Take a look at Time/Memory keeps increasing at every iteration

1 Like