Custom loss function causes loss to go NaN after certain epochs?

Richard_S · November 13, 2020, 6:40pm

I’m using the loss function suggested here for weighted loss in binary classification :

def weighted_binary_cross_entropy(output, target, weights=None):
        
    if weights is not None:
        assert len(weights) == 2
        
        loss = weights[1] * (target * torch.log(output)) + \
               weights[0] * ((1 - target) * torch.log(1 - output))
    else:
        loss = target * torch.log(output) + (1 - target) * torch.log(1 - output)

    return torch.neg(torch.mean(loss))

but after certain epochs, the loss goes NaN :

tensor(0.1091, device='cuda:0', grad_fn=<NegBackward>)
17.43889331445098

tensor(nan, device='cuda:0', grad_fn=<NegBackward>)
nan

This is the output i get when i use BCEloss instead of custom loss :

tensor(0.1180, device='cuda:0', grad_fn=<BinaryCrossEntropyBackward>)
23.021491292864084

what is causing this ?

ptrblck · November 15, 2020, 10:06am

I guess torch.log might get a zero input, which would return -Inf and might propagate as a NaN issue, so you could e.g. clamp the values or add a small eps.

chantk · February 13, 2021, 1:26pm

Hi @ptrblck , I’m also encountering the same issue but in my case, I implemented my custom BCELoss using the code below where predicted is the output after torch.sigmoid

def custom_BCELoss(predicted,target,error_value=-100.0): 

  assert target.size()==predicted.size(), 'Size of labels is not the same!'

  term_a = torch.log(predicted)
  term_a[torch.isinf(term_a)] = error_value
  
  term_b = torch.log(1.0 - predicted)
  term_b[torch.isinf(term_b)] = error_value

  loss = -1.0 * (target * term_a + (1.0 - target) * term_b)

  loss = torch.mean(loss)

  return loss

It seems to work fine but in all cases, the predicted value will produce nan after approximately 100 epochs. I tried to use the track anomaly function to trace the error,

torch.autograd.set_detect_anomaly(True)

and it’s telling me that the error occurs at

Warning: Error detected in Log Backward torch.log(1.0 - predicted)

Since the predicted is output from torch.sigmoid, I believed the cause of error should be the isinf replace function term_b[torch.isinf(term_b)] = error_value, and I should use torch.clamp to clamp the -inf value to “error_value” instead? But if replacing -inf is a problem why would it work in the first 100 epochs?

ptrblck · February 13, 2021, 8:18pm

I guess in the first epochs the output logits were not saturated and torch.sigmoid didn’t return a zero or one yet.

Replacing the invalid values after they were calculated won’t avoid computing invalid gradients:

predicted = torch.zeros(1, requires_grad=True)
term_a = torch.log(predicted)
term_a[torch.isinf(term_a)] = -100.
term_a.backward()
print(predicted.grad)
> tensor([nan])

chantk · February 14, 2021, 1:50am

@ptrblck thanks for the explanation. I tried replacing

term_b = torch.log(1.0 - predicted)

with

term_b = torch.clamp(term_b,-100.0,0.0)

and using your code to double check, it seems like it will still produce nan. In this case should I add a small 1e-8 inside the log term? but that would produce some difference with the value calculated with nn.BCELoss

ptrblck · February 14, 2021, 5:33am

Adding a small eps value might work.

Note that nn.BCELoss also clamps its log function outputs as described in the docs:

For one, if either y_n = or (1 - y_n) = 0, then we would be multiplying 0 with infinity. Secondly, if we have an infinite loss value, then we would also have an infinite term in our gradient, since […]
Our solution is that BCELoss clamps its log function outputs to be greater than or equal to -100. This way, we can always have a finite loss value and a linear backward method.

chantk · February 14, 2021, 5:58am

I tested the clamp version and so far there isn’t any error (at 300 epochs ++ now). Seems like the clamp version works? I guess I will have to implement the eps version if there’s something wrong with the clamp version. Thanks for the help