Help clarifying repackage_hidden in word_language_model

(Wenchen Li) #1

In the example of word_language_model, we have

def repackage_hidden(h):
    """Wraps hidden states in new Variables, to detach them from their history."""
    if type(h) == Variable:
        return Variable(
        return tuple(repackage_hidden(v) for v in h)

I dont think I fully understand what the “history” includes, can somebody helps clarify this?


(Adam Paszke) #2

Every variable has a .creator attribute that is an entry point to a graph, that encodes the operation history. This allows autograd to replay it and differentiate each op. So each hidden state will have a reference to some graph node that has created it, but in that example you’re doing BPTT, so you never want to backprop to it after you finish the sequence. To get rid of the reference, you have to take out the tensor containing the hidden state and wrap it in a fresh Variable, that has no history (is a graph leaf). This allows the previous graph to go out of scope and free up the memory for next iteration.

How does detach() work?
Repackage_hidden function in word_language_model example
[resolved] LSTM with image data - how to save GPU memory?
(James Bradbury) #3

I was going to add that .detach() does the same thing, but I checked the code and realized that I’m not at all sure about the semantics of var2 = var1.detach() vs var2 = Variable(

Inplace matrix modification
(Adam Paszke) #4

Right now the difference is that .detach() still retains the reference, but it should be fixed.

It will change once more when we add lazy execution. In eager mode, it will stay as is (always discard the .creator and mark as not requiring grad). In lazy mode var1.detach() won’t trigger the compute and will save the reference, while Variable( will trigger it, because you’re accessing the data.

(Yihong Chen) #5

So we do not need to repackage hidden state when making predictions ,since we don’t do a BPTT ?

(jdhao) #6

For any latecomers, Variable object does not have creator attribute any more, which is renamed to grad_fn. You can see here for more information.