Memory usage of a python process increases slowly

I’m running the torch a3c implementation from here and the RAM is eaten up gradually, after a few hours it reaches 10gb. Behavior is reproducible on both Mac and linux for me.

I tried memory_profiler and objgraph but couldn’t find where the leak occurs, It seems to me things are ok but the garbage collected memory isn’t allocated back to OS or something. I tried running “mnist_hogwild” and it doesn’t happen there though.

Here is the exact code I’m running.
It happens on CPU, python 3.5, torch version - 0.1.10+ac9245a

Any help will be appreciated!

log:

[2017-03-25 21:31:25] INFO [MainThread:67] Time 00h 00m 00s, episode reward 0.0, episode length 102
[2017-03-25 21:31:44] INFO [MainThread:53] Memory usage of one proc: 110.3203125 (mb)
[2017-03-25 21:32:04] INFO [MainThread:53] Memory usage of one proc: 112.55078125 (mb)
[2017-03-25 21:32:24] INFO [MainThread:53] Memory usage of one proc: 114.15625 (mb)
[2017-03-25 21:32:25] INFO [MainThread:67] Time 00h 01m 01s, episode reward 1.0, episode length 100
[2017-03-25 21:32:43] INFO [MainThread:53] Memory usage of one proc: 115.5703125 (mb)
[2017-03-25 21:33:03] INFO [MainThread:53] Memory usage of one proc: 117.25390625 (mb)
[2017-03-25 21:33:23] INFO [MainThread:53] Memory usage of one proc: 118.30859375 (mb)
[2017-03-25 21:33:26] INFO [MainThread:67] Time 00h 02m 02s, episode reward 0.0, episode length 100
[2017-03-25 21:33:44] INFO [MainThread:53] Memory usage of one proc: 119.89453125 (mb)
[2017-03-25 21:34:06] INFO [MainThread:53] Memory usage of one proc: 121.17578125 (mb)
[2017-03-25 21:34:27] INFO [MainThread:53] Memory usage of one proc: 122.94921875 (mb)
[2017-03-25 21:34:27] INFO [MainThread:67] Time 00h 03m 02s, episode reward 0.0, episode length 100
[2017-03-25 21:34:48] INFO [MainThread:53] Memory usage of one proc: 124.7265625 (mb)
[2017-03-25 21:35:09] INFO [MainThread:53] Memory usage of one proc: 126.6015625 (mb)
[2017-03-25 21:35:28] INFO [MainThread:67] Time 00h 04m 03s, episode reward 0.0, episode length 100
[2017-03-25 21:35:31] INFO [MainThread:53] Memory usage of one proc: 128.44140625 (mb)
[2017-03-25 21:35:52] INFO [MainThread:53] Memory usage of one proc: 130.33203125 (mb)
[2017-03-25 21:36:13] INFO [MainThread:53] Memory usage of one proc: 131.453125 (mb)
[2017-03-25 21:36:28] INFO [MainThread:67] Time 00h 05m 04s, episode reward 0.0, episode length 100
[2017-03-25 21:36:35] INFO [MainThread:53] Memory usage of one proc: 133.35546875 (mb)
[2017-03-25 21:36:56] INFO [MainThread:53] Memory usage of one proc: 134.6015625 (mb)
[2017-03-25 21:37:18] INFO [MainThread:53] Memory usage of one proc: 136.1953125 (mb)
[2017-03-25 21:37:29] INFO [MainThread:67] Time 00h 06m 05s, episode reward 0.0, episode length 100
[2017-03-25 21:37:39] INFO [MainThread:53] Memory usage of one proc: 137.76171875 (mb)
[2017-03-25 21:38:00] INFO [MainThread:53] Memory usage of one proc: 139.48046875 (mb)
[2017-03-25 21:38:22] INFO [MainThread:53] Memory usage of one proc: 140.7109375 (mb)
[2017-03-25 21:38:30] INFO [MainThread:67] Time 00h 07m 05s, episode reward 1.0, episode length 102
[2017-03-25 21:38:43] INFO [MainThread:53] Memory usage of one proc: 142.203125 (mb)
[2017-03-25 21:39:04] INFO [MainThread:53] Memory usage of one proc: 144.05078125 (mb)
[2017-03-25 21:39:26] INFO [MainThread:53] Memory usage of one proc: 145.82421875 (mb)
[2017-03-25 21:39:31] INFO [MainThread:67] Time 00h 08m 06s, episode reward 0.0, episode length 100
[2017-03-25 21:39:47] INFO [MainThread:53] Memory usage of one proc: 147.72265625 (mb)
[2017-03-25 21:40:09] INFO [MainThread:53] Memory usage of one proc: 148.81640625 (mb)
[2017-03-25 21:40:30] INFO [MainThread:53] Memory usage of one proc: 150.515625 (mb)
[2017-03-25 21:40:31] INFO [MainThread:67] Time 00h 09m 07s, episode reward 1.0, episode length 100
[2017-03-25 21:40:51] INFO [MainThread:53] Memory usage of one proc: 152.44140625 (mb)
[2017-03-25 21:41:12] INFO [MainThread:53] Memory usage of one proc: 153.80859375 (mb)
[2017-03-25 21:41:32] INFO [MainThread:67] Time 00h 10m 08s, episode reward 0.0, episode length 100
[2017-03-25 21:41:34] INFO [MainThread:53] Memory usage of one proc: 155.18359375 (mb)
[2017-03-25 21:41:55] INFO [MainThread:53] Memory usage of one proc: 156.71484375 (mb)
[2017-03-25 21:42:16] INFO [MainThread:53] Memory usage of one proc: 158.40234375 (mb)
[2017-03-25 21:42:33] INFO [MainThread:67] Time 00h 11m 08s, episode reward 1.0, episode length 109
[2017-03-25 21:42:38] INFO [MainThread:53] Memory usage of one proc: 160.1328125 (mb)
[2017-03-25 21:42:59] INFO [MainThread:53] Memory usage of one proc: 161.53515625 (mb)
[2017-03-25 21:43:21] INFO [MainThread:53] Memory usage of one proc: 163.41015625 (mb)

I think it might be similar to this:
https://discuss.pytorch.org/t/tracking-down-a-suspected-memory-leak/

I so far found that the torch precompiled pip version worked better than my self-compiled one, but I have not found out yet why.

Thanks @tom,
The problem wasn’t related to multiprocessing, I even removed the LSTM in feedforward. and I tried both Conda and pip installations.

Recent updates solved this problem for us - https://github.com/ikostrikov/pytorch-a3c/issues/11

We’ve had a memory leak in the numpy conversion and your version didn’t have it. Updating PyTorch might help you.

Yes, like I said the problem is gone in a recent update :slight_smile:

I am also facing the same problem for maskrcnn benchmark. I am running torch = 1.4.0