LSTM - question regarding hidden states

gortali · March 24, 2021, 9:43am

Hello everyone,

I’m implementing an LSTM for a very particular time series problem where I basically have a dynamical system that advances in time and the LSTM analyzes this and makes some predictions.

My question is that I had a bug when I started working on the code which i solved adding this line:

hidden = tuple([each.data for each in hidden])

I don’t really understand why is this needed and what this does, and cannot find anywhere a proper explanation. If you can point me to some resource I would be very grateful.

Thank you in advance.

googlebot · March 24, 2021, 1:06pm

this is preventing gradient backpropagation, like detach()

gortali · March 24, 2021, 1:08pm

And why do we want to do that? In fact I do want to backpropagate in time to learn how the dynamics evolves…

googlebot · March 24, 2021, 1:30pm

that was an answer about what your code does, I can’t answer why you are using it

gortali · March 24, 2021, 1:38pm

Sure, the fact is that I see this used all over tutorials but I don’t undersand why.

If I need to translate a sentence like “I am going to the park”, I can’t do the detach after each word, right? But maybe I can do the detach at the end of the sentence, before translating a new one, assuming information on the previous sentence has nothing to do with the successive? Or do I need to do the detach after every backward() step that I do? The question is in general when should you do the detach.

suraj.pt · March 24, 2021, 2:28pm

Generally you would not need to backprop through hidden states. Detaching hidden states from the graph reduces autograd’s memory and time consumption. Take a look at Time/Memory keeps increasing at every iteration