Debugging "RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed."

I am getting the following error

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

in a fairly complex neural network.

Is there any way to debug this kind of errors, to see where is it being thrown?
Using the function register_backward_hook via nn.apply(fn) where fn registers a hook that prints the name of the class during the backward pass, did not take me any further. It looks like the prints happen before the exception is raised, so it looks like the gradients can be computed, but after that the backward pass breaks.

1 Like

Have you searched the forum and read this post, for example?

I have read most of the posts that talk about this issue in the forum, and even though I understand the problem, I don’t know of a way of debugging it.

For what is worth, I already solved the problem. I was indeed storing a Variable in some function deep in the code, but I would still be interested in knowing if there’s any technique to look for the possible Variable in the code that causes these kind of errors.

I think right now the best way is to understand what your computation graph looks like and then find if your are trying to back-prop through the graph more than once.

I guess a solution with the new release, if you have a network with supported ops, would be to export the network with onnx twice and the second one should contain the first one, linked via the Variable that should not be there :slight_smile: