Cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58

I am getting CUDA OUT OF MEMORY ERROR, for a regularizer function, added
in place of a l2 regularizer. Now, as I understand, since the loss generated
by regularizer would be added to total loss, and would be updated as part
of functioning of Optimizer. The variables should be requires_grad=True.

But seems like, since I am using many intermediate variables, I am running
out of memory. I came across a post talking about volatile and no_grad functions,
but I think i didn’t quite understand that, I was just wondering if that would
help me in this case.(SInce reducing batch size from 128 to 8 didn’t help, and simple
l2 reg works well, suggesting 128 is not big batch size for the model)

PS: The Function below is just performing two iteration of power iteration, to find
the max singular value.

Function:

def l2_reg(mdl):
        l2_reg = None
        for w_tmp in mdl.parameters():
                if w_tmp.ndimension() < 2:
                        continue
                else:
                        b_k = Variable(torch.rand(w_tmp.shape[1],1), requires_grad=False)
                        b_k = b_k.cuda()

                        w2 = torch.matmul(w_tmp, b_k)
                        b1 = torch.max(torch.abs(w2))
                        w3 = torch.div(w2,b1)

                        w4 = torch.matmul(w_tmp,w3)

                        if l2_reg is None:
                                l2_reg = torch.max(torch.abs(w4))
                        else:
                                l2_reg = l2_reg + torch.max(torch.abs(w4))
        return l2_reg

Error:

  File "train.py", line 30, in l2_reg_ortho
    m  = torch.matmul(wt,w1)
  File "/usr/local/torch3/lib/python3.5/site-packages/torch/functional.py", line 173, in matmul
    return torch.mm(tensor1, tensor2)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58

I had encountered this issue before and earlier used gc.collect in my forward(). But adding gc.collect() also doesn’t seem to be helping here. I have a hunch that extra variables used here are the reason for memory error. But not sure, how should I avoid it.
Would really appreciate if someone could suggest/comment on this issue.

Found out the mistake in Regularizer function. Shouldn’t have declared a Variable for type requires_grad=True.Works fine after that.

Why does declaring it a Variable with required_grad=True make it run out of memory? Why is that a bad thing?