How could I save memory in multi-scale testing?

Hi,

I am evaluating my model, some part of the code is like this:

        eval_scales = (0.5, 0.75, 1, 1.25, 1.5, 2)
        eval_flip = True
        for i, (imgs, label) in diter:
            N, _, H, W = label.shape
            probs = torch.zeros((N, n_classes, H, W))
            probs.requires_grad = False
         
            for sc in eval_scales:
                new_hw = [int(H*sc), int(W*sc)]
                with torch.no_grad():
                    im = F.interpolate(imgs, new_hw, mode='bilinear', align_corners=True)
                    im = im.cuda()
                    out = net(im)
                    out = F.interpolate(out, (H, W), mode='bilinear', align_corners=True)
                    prob = F.softmax(out, 1)
                    probs += prob.cpu()
                    if eval_flip:
                        out = net(torch.flip(im, dims=(3,)))
                        out = torch.flip(out, dims=(3,))
                        out = F.interpolate(out, (H, W), mode='bilinear',
                                align_corners=True)
                        prob = F.softmax(out, 1)
                        probs += prob.cpu()
                    del out, prob

Problem is like this: If I use eval_scales=(2,), it takes around 4G of my gpu memory. However, if I use eval_scales=(0.5, 0.75, 1, 1.25, 1.5, 2). The memory usage would reach as much as 8G or so.
I wonder why the memory will not be released in time. Since the memory is actually used 4G at most when I use an input scale of 2, the other smaller scales should not matter and takes extra memory. How could I save these extra memory please ?

Could you flip the order of eval_scales, i,e. eval_scales = (2, 1.5, 1.25, 1, 0.75, 0.5) and check the memory usage again? Maybe you see some memory fragmentation, which could be avoided by allocating the largest memory block first?

1 Like

Thanks, that works partially. Here I found that if I comment these lines:

                   if eval_flip:
                        out = net(torch.flip(im, dims=(3,)))
                        out = torch.flip(out, dims=(3,))
                        out = F.interpolate(out, (H, W), mode='bilinear',
                                align_corners=True)
                        prob = F.softmax(out, 1)
                        probs += prob.cpu()

The memory would reduce around 3G. Why the memory usage is reduced, and do I have a chance to take use of it here?

Because if you save 1 variable, you are saving the entire computation graph
Even with cpu()
Maybe try detach()
Double check if you are keeping anything at each iteration. You can also put the inner of the for loop into a function, so when it’s out of context, it’ll auto free all memory (except what you return of course)

Thanks, could I compute some of these tensors with inplace, so I could reduce memory of the middle variable?