From this thread I found out that I apparently store the whole computational graph in my list with each iteration.
How am I supposed to handle this if I need the states for the gradient computation? Can I still detach them at some point?
From this thread I found out that I apparently store the whole computational graph in my list with each iteration.
How am I supposed to handle this if I need the states for the gradient computation? Can I still detach them at some point?