I am getting CUDA OUT OF MEMORY ERROR, for a regularizer function, added
in place of a l2 regularizer. Now, as I understand, since the loss generated
by regularizer would be added to total loss, and would be updated as part
of functioning of Optimizer. The variables should be requires_grad=True.
But seems like, since I am using many intermediate variables, I am running
out of memory. I came across a post talking about volatile and no_grad functions,
but I think i didn’t quite understand that, I was just wondering if that would
help me in this case.(SInce reducing batch size from 128 to 8 didn’t help, and simple
l2 reg works well, suggesting 128 is not big batch size for the model)
PS: The Function below is just performing two iteration of power iteration, to find
the max singular value.
def l2_reg(mdl): l2_reg = None for w_tmp in mdl.parameters(): if w_tmp.ndimension() < 2: continue else: b_k = Variable(torch.rand(w_tmp.shape,1), requires_grad=False) b_k = b_k.cuda() w2 = torch.matmul(w_tmp, b_k) b1 = torch.max(torch.abs(w2)) w3 = torch.div(w2,b1) w4 = torch.matmul(w_tmp,w3) if l2_reg is None: l2_reg = torch.max(torch.abs(w4)) else: l2_reg = l2_reg + torch.max(torch.abs(w4)) return l2_reg
File "train.py", line 30, in l2_reg_ortho m = torch.matmul(wt,w1) File "/usr/local/torch3/lib/python3.5/site-packages/torch/functional.py", line 173, in matmul return torch.mm(tensor1, tensor2) RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58