Proper way to avoid divide by 0 in custom loss function

Hi Henry!

It looks like your issue is due to a troublesome bug in the innards of
autograd – not specific to torch.where(), but in lower-level infrastructure.

However, in your use case, you can work around it by clamping the
denominator of your potential divide-by-zero away from zero. Here
is an illustrative script that contains a modified version of your custom
loss function:

import torch
from torch import nn
import torch.nn.functional as F

print ('torch.__version__', torch.__version__)

torch.manual_seed (2021)

class ConditionalMeanRelativeLoss(nn.Module):
    def __init__(self):
        super(ConditionalMeanRelativeLoss, self).__init__()
    
    def forward(self, output, target):
        # calculate absolute errors
        absolute_errors = torch.abs(torch.subtract(output, target))
        # where target is too small, use just the absolute errors to avoid divide by 0
        loss = torch.where(torch.abs(target) < 0.001, absolute_errors, torch.abs(torch.divide(absolute_errors, target)))
        print ('pre-mean loss =', loss)
        # return mean loss
        return torch.mean(loss)
    

class ConditionalMeanRelativeLossB(nn.Module):
    def __init__(self):
        super(ConditionalMeanRelativeLossB, self).__init__()
    
    def forward(self, output, target):
        # calculate absolute errors
        absolute_errors = torch.abs(torch.subtract(output, target))
        # where target is too small, use just the absolute errors to avoid divide by 0
        # but clamp abs (target) away from zero to avoid "ghost" divide by 0
        abs_target = torch.abs (target).clamp (0.0005)
        loss = torch.where(abs_target < 0.001, absolute_errors, torch.divide(absolute_errors, abs_target))
        print ('pre-mean loss (B) =', loss)
        # return mean loss
        return torch.mean(loss)
    

outputA = torch.randn (5)
outputB = outputA.clone()
outputA.requires_grad = True
outputB.requires_grad = True
target = torch.randn (5)
target[2] = 0.0
target[3] = 0.0

print ('outputA =', outputA)
print ('outputB =', outputB)
print ('target =', target)

ConditionalMeanRelativeLoss() (outputA, target).backward()
print ('outputA.grad  =', outputA.grad)

ConditionalMeanRelativeLossB() (outputB, target).backward()
print ('outputB.grad  =', outputB.grad)

And here is its output:

torch.__version__ 1.7.1
outputA = tensor([ 2.2871,  0.6413, -0.8615, -0.3649, -0.6931], requires_grad=True)
outputB = tensor([ 2.2871,  0.6413, -0.8615, -0.3649, -0.6931], requires_grad=True)
target = tensor([ 0.9023, -2.7183,  0.0000,  0.0000,  0.4822])
pre-mean loss = tensor([1.5346, 1.2359, 0.8615, 0.3649, 2.4375], grad_fn=<SWhereBackward>)
outputA.grad  = tensor([ 0.2216,  0.0736,     nan,     nan, -0.4148])
pre-mean loss (B) = tensor([1.5346, 1.2359, 0.8615, 0.3649, 2.4375], grad_fn=<SWhereBackward>)
outputB.grad  = tensor([ 0.2216,  0.0736, -0.2000, -0.2000, -0.4148])

As to the autograd bug: A cluster of github issues shows that this is a
known problem. I don’t understand the details, but some of the comments
suggest that this bug might be tricky to fix, and perhaps won’t get fixed.

But I think (probably in general, not just in your use case) that if you
understand what is going on, you can work around it.

Here are a few of the relevant github issues:

Best.

K. Frank

2 Likes