Memory Leak during GPU inference

I have a question related to memory leak during Pytorch inference on GPU. I have some code which has memory issues. I did some profiling, and it seems these two lines are eating memory somehow.


The “outputs” variable is a pytorch tensor on gpu, and should be around 1Mb after being converted to numpy array. The methods I tried include using gc to release memory and pre-assign a numpy array for the prediction, but still for every 1k samples, these two lines use 0.1Gb memory. So after 1million samples, the cpu memory is all gone. Let me know if anyone who has encountered similar situations.

I’m not sure I completely understand the issue.

I assume this 0.1GB of memory is used on the GPU (the host should use ~1GB in this case)?
If so, how is ind defined? How are you measuring the GPU memory and is it stable after removing this pandas call?

The 0.1GB is used on CPU. Notice that I use cpu() function to move the outputs from GPU to CPU, so that I can save it to the CPU memory. The ind is just the indices of the corresponding samples, so that I can save accordingly. The GPU memory is stable, and the problem is only on the CPU.

In your code you are appending a 1MB CPUTensor 1000 times, so it would be expected to see a memory increase of ~1GB. I still don’t understand why the memory increase is only 0.1GB.

I think you are right that 1MB cpuTensor copied 1000 times should be ~1GB. However, what I mean is that the numpy array converted from Tensor takes about 1Mb, but the conversion process takes 0.1Gb memory somehow during program running. After this line of code is repeated 1000 times, 100 GB of cpu memory is used. That’s why I am having cpu memory issue since my cpu only has around 90GB memory in total.

Thanks for clarifying the issue. So to the expected ~1GB you are seeing the additional 100GB of memory usage?
Could you post a minimal, executable code snippet which would show this behavior, please?