Can fellow users share their methods of identifying and removing memory leaks. I’m not returning anything other than a scalar loss from my model/it’s trainer hoping the scoping with be enough to remove the unused variables but eventually the entire setup runs out of memory. This is a RAM oom rather than a GPU oom. I’m on torch 0.4.1.
Could you post a small code example to reproduce the error?
How are you returning the loss?
Hey, the codebase was a bit of a mess to isolate a small sample, but I believe the following seems to have done the trick.
# Multiply with logprobs generator_objective = (advantages * log_probs).sum(dim=0) - return (generator_objective, cumulative_rewards) + return (generator_objective, cumulative_rewards.clone())
I don’t have a requirement to call
.backward() on the
cumulative_rewards, and that seems to be creating an issue. Can you tell what’s happening under the hood?
I’m not sure, how
cumulative_rewards is being calculated, but it might be the computation graph is attached to it. If you store this tensor somehow or keep it alive, all computation graphs will be stores as well.
You could call
detach() on it to detach it from the computation graph so that it can be freed.