Does torch.cholesky have a memory leak?

Hi all,

I recognized that my code consumes more and more memory with each epoch that I am training my model. I have tracked the memory leak down by commenting out parts of my code and its seems that torch.choleksy is the culprit (If i only comment out the line with torch.cholesky my code has no increase in memory usage).
I am training on CPU, so this is not an issure of the GPU memory but of the actual main memory of the pc. I have seen that in the past there has already been problems with memory leakage and torch.choleksy, however, with respect to gpu ram:

Unfortunately, my model is far too large to provide the code here. So I just wanted to ask if someone has had similar issues?

I check the memory consumption with:
77 def memory_usage_psutil():
78 # return the memory usage in MB
79 process = psutil.Process(os.getpid())
80 mem = process.memory_info().rss / float(10 ** 6) # /float(2 ** 20)
81 return mem

Hi,

The issue you linked does mention that old version of magma have memory leaks. Make sure that you use a recent enough version of magma (and if you use pytorch binaries, the latest stable version or nightly).

Hi,

thank you for your reply. Do think that this issue might also affect the main memory? (in that link they only talk about gpu memory)
Have you ever heard of someone having a similar issue like me with this function?
I am just wondering because this forum thread I have linked is like 2 years old and I have installed my current pytorch about half a year ago so I would assume that I have a MAGMA version that does not have this bug anymore

No I don’t recall any other issue like that.
Are you sure that you don’t hold on to some memory across iterations?

Does passing the exact same inputs to cholesky outside of your model gives the same behavior?

Unfortunately, it is pretty difficult to do it outside of my model because there are many computations performed with the data before actually handing them to torch.cholesky, but as I said, if I comment out the line with the torch.cholesky call, I dont have these memory issues…

But still, maybe you re right and these issues stem from something else but only get revealed somehow through the usage of torch.cholesky.
When you say “Are you sure that you don’t hold on to some memory across iterations?” what do you mean by that? What kind of functions or coding could lead to holding onto memory?

Unfortunately, it is pretty difficult to do it outside of my model because there are many computations performed with the data before actually handing them to torch.cholesky

What about just logging the size of the matrix and try with a matrix with random values?

When you say “Are you sure that you don’t hold on to some memory across iterations?” what do you mean by that? What kind of functions or coding could lead to holding onto memory?

This can happen mostly if store data in a data structure at each iteration, always appending. Accumulate across iteration (like a epoch_loss) Tensors that still require gradients and hold onto their history

So I tried what you suggested and just always created a random tensor (in each iteration a new random tensor is created) with torch.randn of the same shape as the data. Then making this random tensor positive definite by matrix multiplying it with its transpose and then computing torch.cholesky of that random matrix instead of calculating torch cholesky of the data. And the result in terms of memory usage is the same: I.e., the memory usage still increased every epoch.
And in case that I comment out that torch.choleksy call, the memory usage does still increase but only marginally and only every 20 epochs or so and orders of magnitude smaller. So to me it seems like torch.cholesky was the culprit.

Or could you imagine that actually another error is the root of all evil and is only enhanced by the torch cholesky call? I.e., could you imagine that still one of the other possible issues you mentioned could be at the heart of the problem?

Given that you still see issues without it, it is possible yes.

  • Does calling gc.collect() at every iteration solves the issues (both with and without the cholesky)?
  • What do you use when you remove the cholesky, do you link the output with the input with a differentiable manner? That could explain the reduction in memory when you remove it.
  • Do you use .backward(create_graph=True) ?