It looks like you are storing output Variables, and that is the source of the problem because when you store a Variable, you force python to keep in memory the entire computation graph for that Variable.
You should probably save the underlying tensors instead.
When I do a backprop, don’t I need that computation graph? I got around this by taking the float values from the Variables but then my network never actually learned
EDIT: I just tested this by changing the pytorch RL example and saving the m.log_prob(action).data. It ended up not learning on the backprop and was stuck at an average length of 20-21. This leads me to believe I need the computation graph for backprop
I see.
Well, either you save the log_prob of the action with its computation graph, or you recalculate the log_prob from the state before you calculate the losses.
So, either you accept ballooning memory, or you accept redoing computations.