Hi everyone,I came across some problems about gradient update when training my network.i’m using u-net as my model,and i write a custom loss function as Dice Loss Function,

class DiceCoffLoss(nn.Module):
def __init__(self, label_of_interest=1):
super(DiceCoffLoss, self).__init__()
self.labelInterest = label_of_interest
def forward(self,prediction, segmentation ):
""" inputs are 2d arrays """
segmentation=torch.autograd.Variable(segmentation)
prediction=torch.autograd.Variable(prediction)
if prediction.shape != segmentation.shape:
raise ValueError("Shape mismatch between given arrays. prediction %s vs segmentation %s" \
% (str(prediction.shape), str(segmentation.shape)))
n_organ_seg = (segmentation==self.labelInterest).sum()
n_organ_pred= (prediction == self.labelInterest).sum()
denominator = n_organ_pred + n_organ_seg
if denominator == 0:
return torch.tensor(1.0,requires_grad=True) #mask or predected is empty of interested label
iflat = prediction.contiguous().view(-1)
tflat = segmentation.contiguous().view(-1)
organ_intersection=(iflat==self.labelInterest)*(tflat==self.labelInterest) #Subscription operator
n_organ_intersection = organ_intersection.sum()
dice = ((2.0*n_organ_intersection / denominator) )
return torch.tensor(1-dice ,requires_grad=True)

i used my model with other Loss functions like nn.functional.cross_entropy everything was OK ,but when i’m using my own loss function loss.backward can not update model and model can’t train anymore.

I am still working on that and failed to figure out why the gradient didn’t update as usual. Any inspiration would be sincerely appreciated!!

In the first lines of forward you are re-wrapping your predictions and targets in new Variables.
This will detach the tensors from the computation graph, so that no information will flow back to your model.
As Variables are deprecated since 0.4.0, you don’t need to warp tensors anymore.
However, even in older versions these lines would be problematic.
Just try to remove them or do you get any error, if you just use prediction and segmentation to calculate your dice loss?

thank for your reply,yes you are right, I did a non-useful work on these two lines.I changed my code as below:

class DiceCoffLoss(nn.Module):
def __init__(self, label_of_interest=1):
super(DiceCoffLoss, self).__init__()
self.labelInterest = label_of_interest
def forward(self,prediction, segmentation ):
""" inputs are 2d arrays """
if prediction.shape != segmentation.shape:
raise ValueError("Shape mismatch between given arrays. prediction %s vs segmentation %s" \
% (str(prediction.shape), str(segmentation.shape)))
n_organ_seg = (segmentation==self.labelInterest).sum()
n_organ_pred= (prediction == self.labelInterest).sum()
denominator = n_organ_pred + n_organ_seg
if denominator == 0:
return torch.tensor(1.0,requires_grad=True) #mask or predected is empty of interested label
iflat = prediction.contiguous().view(-1)
tflat = segmentation.contiguous().view(-1)
organ_intersection=(iflat==self.labelInterest)*(tflat==self.labelInterest) #Subscription operator
n_organ_intersection = organ_intersection.sum()
dice = ((2.0*n_organ_intersection / denominator) )
return torch.tensor(1-dice ,requires_grad=True)

I’ve restarted the program so far, nearly 30 epoches have been completed, but the amount of loss has not changed, even in the amount of Epsilon.
Are there any other wrong things?

In the last line you are re-creating a tensor, thus detaching the tensor from the computation graph.
I’m not sure, what self.labelInterest is, but it seems to be some kind of filter to get the current class.
Could you try to remove the creation of a new tensor and just return (1- dice)?

I apologize for the delay.
about self.labelInterest Yes you guessed right.I removed last re-wrapping but nothing changed.
I changed my code to :

import torch.nn.functional as F
import torch.nn as nn
class SoftDiceLoss(nn.Module):
def __init__(self, label_of_interest=1, weight=None, size_average=True):
super(SoftDiceLoss, self).__init__()
self.labelInterest = label_of_interest
def forward(self, logits, targets):
smooth = 1
num = targets.size(0)
m1 = logits.view(num, -1)
m2 = targets.view(num, -1)
intersection = ((m1==self.labelInterest) * (m2==self.labelInterest) )
score = 2. * (intersection.sum() + smooth).float() / ((m1==self.labelInterest).sum() + (m2==self.labelInterest).sum() + smooth).float()
score = 1 - score
score.requires_grad=True #I get an error without this line of code. It is necessary
return score

but nothing changed ! I even changed the learning rate many times, but it did not have any effect .
what do think about override backward function?is it good idea for this problem?