CUDA MEMORY ERROR with in a Epoch, Even with 4 batchsize

nbansal90 · February 1, 2018, 10:39pm

Hello ALL,

I am implementing WideResnet in pytorch, which has perfectly worked with l2-norm, with a batchsize of 64 on 2 GPUs(1080 Ti). But when I replace it with a inifinity norm defined by (torch.max(torch.abs(w_tmp)))**2, it gives an CUDA Memory error within a epoch of training,where memory usage directly shoots up from 100MB to 12GB, with a very batch size of even 4. Function used for regularization is:

def l2_regu(mdl):
        l2_reg = None
        for W in mdl.parameters():
                if W.ndimension() < 2:
                        continue
                else:
                        w_tmp = W
                        if l2_reg is None:
                                l2_reg = (torch.max(torch.abs(w_tmp)))**2
                        else:   
                                l2_reg = l2_reg + (torch.max(torch.abs(w_tmp)))**2
      
        return l2_reg

If I simply replaced the line to l2_reg = w_tmp.norm(2)**2 everythigng works fine. Also I am using torch.max(torch.abs(w_tmp)))**2 for infinity norm because it always return 1 or an error while used with a variable.

Regards,
Nitin

albanD · February 2, 2018, 10:08am

Could you avoid doing multiple posts for the same thing please?