Pytorch Memory leak with pretrained models

Gul_Zain · November 23, 2021, 6:49am

I am trying to run a model that requires vgg16 pretrained models. My issue is presented here:

github.com/richzhang/PerceptualSimilarity

CUDA OUT OF MEMORY

opened 06:45AM - 21 Nov 21 UTC

I am trying to calculate lpips metric on directory of images. i use following co…de to load and calculate ``` for i in range(len(ground_truth)): p= lpips.im2tensor(lpips.load_image(predictions[i])) g= lpips.im2tensor(lpips.load_image(ground_truth[i])) if use_gpu: p=p.cuda() g=g.cuda() mean_total=mean_total+loss_fn.forward(g,p).mean() im_counter=im_counter+1 del p del g torch.cuda.empty_cache() ``` However, I am getting cuda out of memory error even with alexnet backbone and A100 40GB GPU!

I have narrowed down the problem. Continuous memory increase occurs from line 36 in pretrained_networks file to line 48. (PerceptualSimilarity/pretrained_networks.py at master · richzhang/PerceptualSimilarity · GitHub)

This simply contains the layers from pretrained vgg16 network. I have tried with torch.no_grad() as well and I also tried to not inherit nn.Module. I have tried deleting my variables and also not deleting them. Problem persist and even at A100 40GB I get cuda out of memory after just some 200 images. Looking forward to quick response

ptrblck · November 23, 2021, 7:13am

In the code snippet posted in the linked issue you are accumulating the loss and are thus also storing the computation graphs which are potentially attached to it:

for i in range(len(ground_truth)):

                p= lpips.im2tensor(lpips.load_image(predictions[i]))
                g= lpips.im2tensor(lpips.load_image(ground_truth[i]))
                if use_gpu:
                    p=p.cuda()
                    g=g.cuda()
                mean_total=mean_total+loss_fn.forward(g,p).mean()
                im_counter=im_counter+1
                del p
                del g
                torch.cuda.empty_cache()

The linked line of code only uses a subset of the features of the pretrained model, so I don’t think it’s “leaking” memory. Could you describe how you narrowed down the leak to this line of code?

Gul_Zain · November 23, 2021, 4:28pm

I printed usage before and after certain lines narrowing it down to the lines for which i showed github link. But i will try to find a work around for .forward function maybe call detach().cpu().numpy before doing addition

Gul_Zain · November 23, 2021, 5:49pm

Thanks a lot. calling detach().cpu().numpy() solves the problem.