Why Pytorch library is occupying 13GB storage on my GPU?

ruchitpatel · November 28, 2020, 6:45pm

I’m getting following problem, which I’m not able to solve.

RuntimeError: CUDA out of memory. Tried to allocate 598.00 MiB (GPU 0; 14.73 GiB total capacity; 13.46 GiB already allocated; 337.88 MiB free; 13.46 GiB reserved in total by PyTorch)

This is happening when I’m running my model on test images to produce results.

def buildG(UPSCALE_FACTOR=4):
    netG = Generator(UPSCALE_FACTOR)
    netG.train()
    netG.load_state_dict(torch.load(G_weights_load))
    netG.cuda()

    return netG

netG = buildG()
def test_on_single_image(path='/content/data1/lrtest.jpg',UPSCALE_FACTOR=4):
    img = Image.open(path)
    layer = ToTensor()
    img1 = layer(img)
    sh = img1.shape
    img1 =img1.reshape((1, sh[0], sh[1], sh[2]))
    img2 = netG(img1.cuda(0))
    utils.save_image(img2, imgs_save)
    files.download('/content/temp1.jpg')

My model is runnig good on the training loop (on the GPU). Then why is it giving such GPU memory full error? And I’m not getting why pytorch is occupying memory? There is no clear indication on what is occupying more memory! Weights of netG are just 3MB in size.

ptrblck · November 30, 2020, 1:01am

You might be accidentally storing tensor, which are still attached to the computation graph, which should be visible by an increasing usage of the device memory.
Your current code snippet looks fine, so you might have the problematic line of code in another function.
Wrap your code into a with torch.no_grad() block during the validation to avoid storing intermediate tensors and the computation graph.