Setting requires_grad=True, Throws up Out of Memory Issue

nbansal90 · March 1, 2018, 9:42pm

Hello All,

I am not able to understand an issue. I have to two functions acting as a custom regularizer. The first one throws a CUDA memory error, when I set requires_grad=True for Ident matrix, and second one doesn’t. I am not sure why?

This one throws CUDA out of memory error:

def reg(mdl):
        l2_reg = None
        for W in mdl.parameters():
                if W.ndimension() < 2:
                        continue
                else:
                        wt = torch.transpose(W,0,1)
                        m  = torch.matmul(wt,W)
                        **ident = Variable(torch.eye(cols,cols),requires_grad=True)**
                        ident = ident.cuda()
                        w_tmp = (m - ident)
                        if l2_reg is None:
                                l2_reg = (torch.max(torch.abs(w_tmp)))**2
                        else:
                                l2_reg = l2_reg + (torch.max(torch.abs(w_tmp)))**2
        return l2_reg

This is one doesn’t :

def reg(mdl):
        l2_reg = None
        for W in mdl.parameters():
                if W.ndimension() < 2:
                        continue
                else:
                        wt = torch.transpose(W,0,1)
                        m  = torch.matmul(wt,W)
                        **ident = Variable(torch.eye(cols,cols))**
                        ident = ident.cuda()
                        w_tmp = (m - ident)
                        if l2_reg is None:
                                l2_reg = (torch.max(torch.abs(w_tmp)))**2
                        else:
                                l2_reg = l2_reg + (torch.max(torch.abs(w_tmp)))**2
        return l2_reg

As we see the only difference is the highlighted line. I am not sure why setting requires_grad to True should lead to out of memory issue. That too after the first Epoch of training, with a batch size as low as 2.

yf225 · March 1, 2018, 10:57pm

This might be a good place to start: http://pytorch.org/docs/master/notes/faq.html#my-model-reports-cuda-runtime-error-2-out-of-memory

It seems that the first function keeps the history of ident for each for-loop iteration, which causes the CUDA memory error.

nbansal90 · March 3, 2018, 12:06am

I actually got to observe something interesting(which I didn’t understand).
As pointed out earlier, memory error was only seen for the case where I set requires_grad = true for Ident matrix. But If I simply use other norm such as l2 using torch.norm(w,2) instead of torch.max(torch.abs(w)), even through the require_grad is True, I don’t see the memory error:
I Mean this code , doesn’t throw the memory error:

def reg(mdl):
        l2_reg = None
        for W in mdl.parameters():
                if W.ndimension() < 2:
                        continue
                else:
                        wt = torch.transpose(W,0,1)
                        m  = torch.matmul(wt,W)
                        **ident = Variable(torch.eye(cols,cols), requires_grad=True)**
                        ident = ident.cuda()
                        w_tmp = (m - ident)
                        if l2_reg is None:
                                l2_reg = (torch.norm(w_tmp,2))**2
                        else:
                               l2_reg = l2_reg + torch.norm(w_tmp,2)**2

From here there seems to be some difference between the torch.norm and torch.abs(torch.max()), which we seems to be missing.