Help clarifying repackage_hidden in word_language_model

Every variable has a .creator attribute that is an entry point to a graph, that encodes the operation history. This allows autograd to replay it and differentiate each op. So each hidden state will have a reference to some graph node that has created it, but in that example you’re doing BPTT, so you never want to backprop to it after you finish the sequence. To get rid of the reference, you have to take out the tensor containing the hidden state h.data and wrap it in a fresh Variable, that has no history (is a graph leaf). This allows the previous graph to go out of scope and free up the memory for next iteration.

24 Likes