How to check memory leak in a model


(Victor Ni) #1

Hi all,

I implemented a model in PyTorch 0.4.0, but find that GPU memory increases at some iterations randomly. For example, in the first 1000 iterations, it uses GPU Mem 6G, and at a random iteration, it uses GPU Mem 10G.

I del loss, image, label and use total loss += loss.item() at each iteration, and conjecture that the model leaks memory sometimes.

I also tried to use gc print alive Tensors according to https://discuss.pytorch.org/t/how-to-debug-causes-of-gpu-memory-leaks/6741/3?u=victorni, and found that there was a little difference between two iterations, but how could I figure out the reason?

Our model is https://github.com/twni2016/OrganSegRSTN_PyTorch/blob/master/OrganSegRSTN/model.py, in forward() we use random crop and other operations, maybe that caused memory leak?

Thank you!


(Arul) #2

A helpful comment from @albanD:


(Victor Ni) #3

Thanks for your reply.
I have tried torch.cuda.empty_cache() after I del model, it really works that GPU memory was reduced to almost zero.
However, my problem is that during the iterations of one model, GPU memory may increase. So should I insert torch.cuda.empty_cache() in every iteration?


(Arul) #4

I’m not sure if there is any efficient way than calling it in every iteration. I have noticed that empty_cache() slows down the process a bit, so you have to compromise with that