How to check memory leak in a model

(Victor Ni) #1

Hi all,

I implemented a model in PyTorch 0.4.0, but find that GPU memory increases at some iterations randomly. For example, in the first 1000 iterations, it uses GPU Mem 6G, and at a random iteration, it uses GPU Mem 10G.

I del loss, image, label and use total loss += loss.item() at each iteration, and conjecture that the model leaks memory sometimes.

I also tried to use gc print alive Tensors according to, and found that there was a little difference between two iterations, but how could I figure out the reason?

Our model is, in forward() we use random crop and other operations, maybe that caused memory leak?

Thank you!

(Arul) #2

A helpful comment from @albanD:

(Victor Ni) #3

Thanks for your reply.
I have tried torch.cuda.empty_cache() after I del model, it really works that GPU memory was reduced to almost zero.
However, my problem is that during the iterations of one model, GPU memory may increase. So should I insert torch.cuda.empty_cache() in every iteration?

(Arul) #4

I’m not sure if there is any efficient way than calling it in every iteration. I have noticed that empty_cache() slows down the process a bit, so you have to compromise with that