RAM use monotonically increases

Neofytos · October 19, 2018, 11:22am

I’ve been trying to pin down the issue behind the constant increase of GPU usage but I’ve been unable to.

[it keeps increasing until the kernel dies]

I’ve tried solutions that were given before (del output, output_max, loss, data, targets, index, inputs, labels, make sure I use loss.item(), gc.collect(), torch.cuda.empty_cache()) but nothing worked. What is the best way to track down what’s causing this?

Not sure if this is relevant but one of the things that is different in my implementation is that although each batch is accessed initially using the dataloader, the dataloader also returns the indixes of the accessed data which are then used in conjunction with getitem(index) function to access ‘subsets’ of those data a couple of times.

Let me know if you need further information.

whyStillBother · October 19, 2018, 11:55am

I had the same issue and searched for a solution for a long time. I needed to do two things. Delete the forward return after the call (Or transfer to cpu) and wrap some tensors that were being stored, inside torch.autograd.Variables, (I did tbptt that time). This solution seems to me like a hack, because Variables is marked as deprecated and there surely is another approach to do this better. But I realized after I did that, that I didn’t have to delete anything inside the forward method anymore.
Maybe It will help you solve the problem.
FYI: It would help to know what exactly you are doing.

Neofytos · October 19, 2018, 12:34pm

Hey there, thanks for the reply.

Which tensors would you store as Variables? I’ve tried deleting the forward returns right after but that didn’t help.

I’ve also noticed something that may be of interest: when watching nvidia-smi I can see that during training the usage goes up to maximum and then back down and does that in cycles whereas during validation the usage is steady at 1/4 of the available gpu ram.

The whole thing that I am trying to develop is a bit complicated but it boils down to: given an image X, have the model predict boundary boxes, use those boxes to extract patches and then recursively do that a number of times. Each patch goes through the same CNN and the extracted features are collected into a list. In the end all the extracted features are concatenated and a fcnn is used for classification.

A colleague suggested http://tech.labs.oliverwyman.com/blog/2008/11/14/tracing-python-memory-leaks/ but i am not sure whether it will turn out to be useful. What do you think?

whyStillBother · October 19, 2018, 1:20pm

Storing them as Variables only helps to control the backpropagation Flow. That doesn‘t seem to be an issue exept if you are backpropagating through the whole recursive cnns.
Correct me if im wrong, but during training the gradients have to be allocated, so its normal that the Memory usage increases.
Are the extracted Features in the List Stored as cuda tensors? Maybe transferring them with .cpu() helps. You can call them back to cuda if you Need them.
Its probably not a python memory leak. Have you tried to Train on cpu?
On phone so sry for Upper case.

Neofytos · October 21, 2018, 1:52am

Hi there.

So it turns out it’s RAM issue rather than GPU memory issue. I’ve been trying to allocate where is it coming from but no lack in solving it.

I’ve ran tracemalloc just now and it gives me the following output:

10 files allocating the most memory:
<frozen importlib._bootstrap_external>;:487: size=500 KiB, count=5186, average=99 B 

/home/nd26/anaconda3/envs/pytorch_env/lib/python3.6/lib2to3/pgen2/grammar.py:108: size=345 KiB, count=5489, average=64 B 

/home/nd26/anaconda3/envs/pytorch_env/lib/python3.6/site-packages/torch/nn/functional.py:1189: size=138 KiB, count=5040, average=28 B 

/home/nd26/anaconda3/envs/pytorch_env/lib/python3.6/site-packages/torch/serialization.py:469: size=98.0 KiB, count=1105, average=91 B 

/home/nd26/anaconda3/envs/pytorch_env/lib/python3.6/site-packages/numpy/lib/arraypad.py:945: size=92.0 KiB, count=1311, average=72 B 

/home/nd26/anaconda3/envs/pytorch_env/lib/python3.6/site-packages/torch/tensor.py:33: size=84.7 KiB, count=1920, average=45 B 

/home/nd26/anaconda3/envs/pytorch_env/lib/python3.6/site-packages/torch/_utils.py:94: size=78.7 KiB, count=1007, average=80 B 

/home/nd26/anaconda3/envs/pytorch_env/lib/python3.6/site-packages/torch/tensor.py:36: size=65.8 KiB, count=702, average=96 B 

/home/nd26/anaconda3/envs/pytorch_env/lib/python3.6/site-packages/torch/serialization.py:213: size=64.4 KiB, count=687, average=96 B 

/home/nd26/anaconda3/envs/pytorch_env/lib/python3.6/lib2to3/pgen2/grammar.py:122: size=45.9 KiB, count=475, average=99 B

And, 25 frame traceback:

`5036 memory blocks: 317.3 KiB 
File /home/nd26/anaconda3/envs/pytorch_env/lib/python3.6/lib2to3/pgen2/grammar.py", line 108 d = pickle.load(f) 

File "/home/nd26/anaconda3/envs/pytorch_env/lib/python3.6/lib2to3/pgen2/driver.py", line 134 g.load(gp) 

File "/home/nd26/anaconda3/envs/pytorch_env/lib/python3.6/lib2to3/pgen2/driver.py", line 159 return load_grammar(grammar_source) 

File "/home/nd26/anaconda3/envs/pytorch_env/lib/python3.6/lib2to3/pygram.py", line 32 python_grammar = driver.load_packaged_grammar("lib2to3", _GRAMMAR_FILE) 

File "&lt;frozen importlib._bootstrap&gt;", line 219 

File "&lt;frozen importlib._bootstrap_external&gt;", line 678 

File "&lt;frozen importlib._bootstrap&gt;", line 665 

File "&lt;frozen importlib._bootstrap&gt;", line 955 

File "&lt;frozen importlib._bootstrap&gt;", line 971 

File "/home/nd26/anaconda3/envs/pytorch_env/lib/python3.6/lib2to3/fixer_util.py", line 7 from .pygram import python_symbols as syms 

File "&lt;frozen importlib._bootstrap&gt;", line 219 

File "&lt;frozen importlib._bootstrap_external&gt;", line 678 

File "&lt;frozen importlib._bootstrap&gt;", line 665 

File "&lt;frozen importlib._bootstrap&gt;", line 955 

File "&lt;frozen importlib._bootstrap&gt;", line 971 

File "/home/nd26/anaconda3/envs/pytorch_env/lib/python3.6/lib2to3/refactor.py", line 25 from .fixer_util import find_root 

File "&lt;frozen importlib._bootstrap&gt;", line 219 

File "&lt;frozen importlib._bootstrap_external&gt;", line 678

File "&lt;frozen importlib._bootstrap&gt;", line 665 

File "&lt;frozen importlib._bootstrap&gt;", line 955 

File "&lt;frozen importlib._bootstrap&gt;", line 971 

File "/home/nd26/anaconda3/envs/pytorch_env/lib/python3.6/site-packages/past/translation/__init__.py", line 42 from lib2to3.refactor import RefactoringTool 

File "&lt;frozen importlib._bootstrap&gt;", line 219 

File "&lt;frozen importlib._bootstrap_external&gt;", line 678

File "&lt;frozen importlib._bootstrap&gt;", line 665`

I am quite clueless as to what should I do next. Any advice?

Neofytos · October 22, 2018, 6:29am

I believe I’ve just solved this.

It turns out that a specific function was allocating memory that Python wasn’t able to retrieve back and so the accumulation was the cause of this increase. So to solve this I spawn a new process to run the function and terminate it once I get what I want from it.

For anyone that might come across this issue:

reader = mir.MultiResolutionImageReader()
image = reader.open(self.names[index])

def f(x, y, w, h, l, q):
img = image.getUCharPatch(x, y, w, h, l)
q.put(img)

from multiprocessing import Process, Queue
q = Queue()
p = Process(target=f, args=x, y, w, h, l, q))
p.start()
img = q.get()
p.join()
p.terminate()