About loss.backward error

Cverlpeng · July 19, 2019, 7:29am

hi,

my network have two branch , seg branch and dt branch
I use outputs of two branch to cmpute KLDivLoss，but I get error:one of the variables needed for gradient computation has been modified by an inplace operation , when I loss.backward

class SegmentationMultiLosses(nn.CrossEntropyLoss):
    """2D Cross Entropy Loss with Multi-L1oss"""
    def __init__(self, nclass=-1, weight=None,size_average=True, ignore_index=-1):
        super(SegmentationMultiLosses, self).__init__(weight, size_average, ignore_index)
        self.nclass = nclass


    def forward(self, *inputs):
        
        out, target_img, target_exist, target_dt= tuple(inputs)
       
        out_img, out_dt_img, out_exist = out
        
        loss_seg = super(SegmentationMultiLosses, self).forward(out_img, target_img)
        loss_exist = nn.BCELoss().forward(out_exist, target_exist)
        loss_dt = nn.MSELoss().forward(out_dt_img, target_dt)

        out_dt_norm = self.dt_norm(out_dt_img)
        out_seg_sm  = F.softmax(out_img, dim=1)
        loss_fuse = nn.KLDivLoss().forward(out_seg_sm[:,1:,:,:], out_dt_norm.clone())

      
        print('loss_seg: {}  loss_exit: {}  loss_dt: {} loss_fuse: {} '.format(loss_seg.item(), 
                                          loss_exist.item()*0.1, loss_dt.item(), loss_fuse.item()))
        loss = loss_seg + loss_dt + 0.1*loss_exist + loss_fuse
        return loss

    def dt_norm(self, dt_out):
        # clamp a tensor [0, 10]   and Normalization
        max = 10
        min = 0
        for i in range(dt_out.shape[0]):
            for j in range(dt_out.shape[1]):
                dt_out[i,j,:,:] = torch.div(torch.clamp(dt_out[i,j,:,:], min, max), max)
        return dt_out

dt_norm function is an inplace operation ？
thanks

tom · July 19, 2019, 7:41am

This is bad! use the non-inplace dt_out = x.clamp(min, max) / max instead.

Best regards

Thomas

Cverlpeng · July 19, 2019, 7:54am

def dt_norm(self, dt_out):
        # clamp a tensor [0, 10]   and Normalization
        max = 10
        min = 0
        dt_out = x.clamp(min, max) / max
        return dt_out

Is that right? Are the two functions the same?

and

loss_fuse = nn.KLDivLoss().forward(out_seg_sm[:,1:,:,:], out_dt_norm.clone())
loss_fuse < 0 why?
thank you very much

tom · July 19, 2019, 8:04am

It doesn’t have the side-effect of modifing its input, you could compare results.

loss_fuse = nn.KLDivLoss().forward(out_seg_sm[:,1:,:,:], out_dt_norm.clone())
loss_fuse < 0 why?

The typical way to get negative kl_div is because your arguments are not probability distributions, and indeed out_dt_norm has little reason to be. You could do something like

dt_out_unnormalized = dt_out.clamp(min, max)
dt_out = dt_out_unnormalized. / dt_out_unnormalized.sum((2, 3))

with (2, 3) replaced by whatever dimensons you want the normalization to cover.

(You should be using torch.nn.functional.kl_div, too, and don’t need the clone.)

I must admit I don’t know what you are trying to achieve in terms of modelling and so I cannot say if it is mathematically sound or not.

Best regards

Thomas

Cverlpeng · July 19, 2019, 8:18am

hi, Thomas V
thanks

class SegmentationMultiLosses(nn.CrossEntropyLoss):
    """2D Cross Entropy Loss with Multi-L1oss"""
    def __init__(self, nclass=-1, weight=None,size_average=True, ignore_index=-1):
        super(SegmentationMultiLosses, self).__init__(weight, size_average, ignore_index)
        self.nclass = nclass


    def forward(self, *inputs):
        
        out, target_img, target_exist, target_dt= tuple(inputs)
       
        out_img, out_dt_img, out_exist = out
       
        loss_seg = super(SegmentationMultiLosses, self).forward(out_img, target_img)
        loss_exist = nn.BCELoss().forward(out_exist, target_exist)
        loss_dt = nn.MSELoss().forward(out_dt_img, target_dt)

        out_dt_norm = self.dt_norm(out_dt_img)
        out_seg_sm  = F.softmax(out_img, dim=1)
        loss_fuse = nn.KLDivLoss().forward(out_seg_sm[:,1:,:,:], out_dt_norm)

        print('loss_seg: {}  loss_exit: {}  loss_dt: {} loss_fuse: {} '.format(loss_seg.item(), 
                                loss_exist.item()*0.1, loss_dt.item(), loss_fuse.item()))
        loss = loss_seg + loss_dt + 0.1*loss_exist + loss_fuse
        return loss
    def dt_norm(self, dt_out):
        max = 10
        min = 0
        dt_out = dt_out.clamp(min, max) / max
        return dt_out

it work, but I get loss_fuse(kl loss) < 0

I want to segmentation a target. One branch of the network is trained using the cross entropy loss function. The other branch is a regression branch that predicts the weight of the target. Then I normalize this weight and guide the segmentation branch through klloss.

out_seg_sm[:,1,:,:] : output of softmax
out_dt_norm: normalize [0-1] ，Indicates the importance of each pixel on the target

thank you very much

tom · July 19, 2019, 8:23am

Ah, sorry the input must be log probs and the targets probs. So logsumexp(inputs, -1) should be 0 and sum(targets, -1) should be 1. I would recommend checking that for a few cases.

Best regards

Thomas

Cverlpeng · July 19, 2019, 8:29am

hi,
I don’t understand much，Why is the kl_div() input and target sum must be equal to 0, the sum on the entire feature map is 0, or the sum of a pixel in the feature map in the different channels is 0 ?
thanks

Cverlpeng · July 19, 2019, 9:25am

hi,
out_dt_img : is the discrete value of the regression branch prediction. Should I turn it into a probability distribution like this? Is there any pytorch function that can directly normalize a 4D tensor [b,c,h,w] in the [b,c,:,:] dimension.That is, normalize the pixels on each channel

thanks

tom · July 19, 2019, 11:57am

I think you want x = x.reshape(n, c ,h * w).(log_)softmax(-1).view(n, c, h, w) if you want logsoftmax or x = x / x.sum((-1, -2), keepdim=True) if you have positive values and want to normalize.

Best regards

Thomas