Correct way storing states inside one forward pass

From this thread I found out that I apparently store the whole computational graph in my list with each iteration.

How am I supposed to handle this if I need the states for the gradient computation? Can I still detach them at some point?