I was changing my model and encountered a RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time, but the error message is not very useful.
I have a normal training-loop and don’t use LSTMs or something, so this should not happen.
is there any way to find out which part of the model was responsible?
There might be something calculate globally, and when you .backward() first time, it breaks the computation graph, and the second time you call .backward(), it still need the ‘global’ variable to compute gradient, but it has beed freed (as the error raised). So checking your code, whether there are some variables in computation graph globally.
Is there really no way to get the offending calculation?
I’ve stripped down my network, but it’s still occurring…Like I said, my architecture is quite complicated and there are quite a few moving parts. So there are still a lot of places that could be responsible.
I bet the error is just a stupid mistake somewhere deep in my code