Need help on custom loss

wfpytorch · May 23, 2018, 4:28am

Hi there,

The problem is the gradient is None on my custom loss (Dice loss).

to verify the setup is correct, I used the

criterion = nn.BCEWithLogitsLoss(size_average=False, reduce=True).

The setup is indeed correct as I can retrieve gradients for my parameters using

grad_of_params = {}
for name, parameter in model.named_parameters():
grad_of_params[name] = parameter.grad
print (name, parameter.grad)

when I used my own criterion, gradients become None.

criterion = Dice_Loss()

class Dice_Loss(torch.nn.Module):
def init(self):
super().init()
def forward(self, output, target):

    pred_mask = nn.functional.sigmoid(output)
   
    pred_mask = torch.autograd.Variable(pred_mask.gt(0.5),requires_grad=True)
   
    pred_mask = pred_mask.type(torch.cuda.FloatTensor)
  
    
    true_pos = torch.sum(torch.mul(pred_mask, target))
    true_pos = true_pos / 88
    
    flipped_mask = torch_flip(target)
    false_pos = torch.sum(torch.mul(flipped_mask, pred_mask))
    false_pos = false_pos / 88
   
    flipped_pred = torch_flip(pred_mask)
    false_neg = torch.sum(torch.mul(target, flipped_pred))
    false_neg = false_neg/ 88
    
    
    n = torch.mul(true_pos,2)
    
    dice = torch.div(n, (n+false_neg+false_pos))
    return 1- dice

def torch_flip(arr):
return torch.mul(arr, -1) + 1

As I could not see where the code has gone wrong, any suggestion or recommendation what to debug is very welcome.

Many thanks

ptrblck · May 23, 2018, 11:52am

You are re-wrapping your pred_mask into a new Variable, which detaches the computation graph.
Just try to use:

pred_mask = pred_mask > 0.5

Also, you could update to the latest release, where Variables and tensors were merged.
You can find the install instructions on the website and the migration guide here.

ptrblck · May 23, 2018, 12:12pm

I think the threshold will also kill your gradients.
You could try this implementation.
Would this work for you?

wfpytorch · May 24, 2018, 12:09am

Thank you very much for the links. I was thinking I could be a bit more imaginative and could use the thresholding in the loss. However, the thresholding turns into a step function which gives infinite gradient at 0.5 and 0 elsewhere. That is why I was getting zero or none gradient.

This is my updated version with non-zero gradient during back propagation.

class Dice_Loss(torch.nn.Module):
def init(self):
super().init()
def forward(self, output, target, smooth=1):
pred_mask = nn.functional.sigmoid(output)
true_pos = torch.sum(torch.mul(pred_mask, target))
true_pos = 2*true_pos

    # just in case there is not 1 in the prediction or target
    # stop overfitting as well
    true_pos = true_pos + smooth
    
    pred_square = torch.sum(torch.mul(pred_mask, pred_mask))
    target_square = torch.sum(torch.mul(target, target)) 
    return 1-(true_pos/(pred_square+target_square + smooth))