Hi! I want to implement next in Pytorch.
For example, for sequenced hidden layers, combine hidden layers in some window, calculate loss and then update the weights.
Slide window for one step, find loss and update the weights in that window.
Repeat.

I think that it can be done while backprop, hold the computation graph as it is without freeing it, and free the computation graph when the hidden layer falls outside the window.
It is possible not to free the computation graph while backward by using retain_graph(), but I don’t know how to free computation graph for held arbitrary part.
Can you help me? Or is there any better solution?
By the way, it is not realistic to recalculate the hidden layers every time I update in terms of data size.

Also, can you tell me which class holds computation graph?
And, is it possible to display the computation graph?

There is no current way to easily see the computation graph from python.
The only thing that keep it alive though is the output Variable that it was used to compute. So keeping a computation graph used to compute a Variable is as easy as storing this Variable somewhere, like in a list. Then when you remove it from the list (and no other python variable reference it), then the computational graph will be destroyed.

Thank you for your reply.
I see. By the way, is there a way to explicitly free a partial calculation graph?
E.g. when a variable has a computation graph calculated from n hidden layers, free its computation graph up to n - 10?

You could keep the hidden Variables in a python list and do something like this…

new_hidden = some_calculation(..)
hiddens.append(new_hidden)
while len(hiddens) > max_history:
hiddens.pop(0)
hiddens[0].detach_() # stops backpropagation from going any further back
# do stuff with hiddens

As far as I can tell from its source code .detach() will remove any prior history a Variable has, but still keep the graph connections to any Variables calculated using its value.