How to check memory leak in a model

VictorNi · August 11, 2018, 5:47am

Hi all,

I implemented a model in PyTorch 0.4.0, but find that GPU memory increases at some iterations randomly. For example, in the first 1000 iterations, it uses GPU Mem 6G, and at a random iteration, it uses GPU Mem 10G.

I del loss, image, label and use total loss += loss.item() at each iteration, and conjecture that the model leaks memory sometimes.

I also tried to use gc print alive Tensors according to https://discuss.pytorch.org/t/how-to-debug-causes-of-gpu-memory-leaks/6741/3?u=victorni, and found that there was a little difference between two iterations, but how could I figure out the reason?

Our model is https://github.com/twni2016/OrganSegRSTN_PyTorch/blob/master/OrganSegRSTN/model.py, in forward() we use random crop and other operations, maybe that caused memory leak?

Thank you!

InnovArul · August 12, 2018, 6:23am

A helpful comment from @albanD:

VictorNi · August 13, 2018, 6:46am

Thanks for your reply.
I have tried torch.cuda.empty_cache() after I del model, it really works that GPU memory was reduced to almost zero.
However, my problem is that during the iterations of one model, GPU memory may increase. So should I insert torch.cuda.empty_cache() in every iteration?

InnovArul · August 13, 2018, 7:49am

I’m not sure if there is any efficient way than calling it in every iteration. I have noticed that empty_cache() slows down the process a bit, so you have to compromise with that

addisonklinke · April 23, 2020, 5:09pm

@VictorNi I had a leak in my training loop from tracking the total loss, and was able to resolve it by following the recommendations in this PyTorch FAQ article