Advice on cuda and memory management?

I’m running an off policy rl algorithm with deepminds pysc2, and i am finding im quickly running out of gpu memory. My pc does only have 4 gig of vram, so if this is a bad plan from the start just let me know.

Essentially the run loop of the program goes:
Actor and critic initialised on gpu

  1. observe environment
  2. process observations (into cuda tensors, such as minimap_features, a 1 x 4 x 64 x 64 tensor)
  3. actor.forward(processed observations)
  4. store a bunch of information in the replay buffer (am storing, for example minimap_features.cpu())

the problem im having appears to be that the memory is never being deallocated, with the cuda.memory_allocated and max_memory_allocated remaining (almost) the same as eachother and constantly, linearly increasing. My guessing is that something I am doing is maintaining references to previously calculated variables which should have been cleared. Any advice? (if you want to know more about a specific part of the code i can show)

If you see an increase in memory usage, you are most likely right in maintaining some references to variables.
Could you check your code for storing tensors which are not detached from the computation graph, e.g. losses.append(loss)? This is often the cause of increasing memory usage, as loss still holds the whole computation graph, if you don’t call detach() on it.

I solved this, it turns out that my storing in the replay buffer by using .cpu() on the tensors was keeping a reference, using .cpu().data fixed this

Good to hear it’s fixed. However, the usage of the .data attribute is not recommended so I would use .detach() instead. :wink: