Hello All,
I am not able to understand an issue. I have to two functions acting as a custom regularizer. The first one throws a CUDA memory error, when I set requires_grad=True for Ident matrix, and second one doesn’t. I am not sure why?
This one throws CUDA out of memory error:
def reg(mdl):
l2_reg = None
for W in mdl.parameters():
if W.ndimension() < 2:
continue
else:
wt = torch.transpose(W,0,1)
m = torch.matmul(wt,W)
**ident = Variable(torch.eye(cols,cols),requires_grad=True)**
ident = ident.cuda()
w_tmp = (m - ident)
if l2_reg is None:
l2_reg = (torch.max(torch.abs(w_tmp)))**2
else:
l2_reg = l2_reg + (torch.max(torch.abs(w_tmp)))**2
return l2_reg
This is one doesn’t :
def reg(mdl):
l2_reg = None
for W in mdl.parameters():
if W.ndimension() < 2:
continue
else:
wt = torch.transpose(W,0,1)
m = torch.matmul(wt,W)
**ident = Variable(torch.eye(cols,cols))**
ident = ident.cuda()
w_tmp = (m - ident)
if l2_reg is None:
l2_reg = (torch.max(torch.abs(w_tmp)))**2
else:
l2_reg = l2_reg + (torch.max(torch.abs(w_tmp)))**2
return l2_reg
As we see the only difference is the highlighted line. I am not sure why setting requires_grad to True should lead to out of memory issue. That too after the first Epoch of training, with a batch size as low as 2.