I am getting CUDA OUT OF MEMORY ERROR, for a regularizer function, added
in place of a l2 regularizer. Now, as I understand, since the loss generated
by regularizer would be added to total loss, and would be updated as part
of functioning of Optimizer. The variables should be requires_grad=True.
But seems like, since I am using many intermediate variables, I am running
out of memory. I came across a post talking about volatile and no_grad functions,
but I think i didn’t quite understand that, I was just wondering if that would
help me in this case.(SInce reducing batch size from 128 to 8 didn’t help, and simple
l2 reg works well, suggesting 128 is not big batch size for the model)
PS: The Function below is just performing two iteration of power iteration, to find
the max singular value.
Function:
def l2_reg(mdl):
l2_reg = None
for w_tmp in mdl.parameters():
if w_tmp.ndimension() < 2:
continue
else:
b_k = Variable(torch.rand(w_tmp.shape[1],1), requires_grad=False)
b_k = b_k.cuda()
w2 = torch.matmul(w_tmp, b_k)
b1 = torch.max(torch.abs(w2))
w3 = torch.div(w2,b1)
w4 = torch.matmul(w_tmp,w3)
if l2_reg is None:
l2_reg = torch.max(torch.abs(w4))
else:
l2_reg = l2_reg + torch.max(torch.abs(w4))
return l2_reg
Error:
File "train.py", line 30, in l2_reg_ortho
m = torch.matmul(wt,w1)
File "/usr/local/torch3/lib/python3.5/site-packages/torch/functional.py", line 173, in matmul
return torch.mm(tensor1, tensor2)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58