I guess torch.log might get a zero input, which would return -Inf and might propagate as a NaN issue, so you could e.g. clamp the values or add a small eps.

Hi @ptrblck , I’m also encountering the same issue but in my case, I implemented my custom BCELoss using the code below where predicted is the output after torch.sigmoid

def custom_BCELoss(predicted,target,error_value=-100.0):
assert target.size()==predicted.size(), 'Size of labels is not the same!'
term_a = torch.log(predicted)
term_a[torch.isinf(term_a)] = error_value
term_b = torch.log(1.0 - predicted)
term_b[torch.isinf(term_b)] = error_value
loss = -1.0 * (target * term_a + (1.0 - target) * term_b)
loss = torch.mean(loss)
return loss

It seems to work fine but in all cases, the predicted value will produce nan after approximately 100 epochs. I tried to use the track anomaly function to trace the error,

torch.autograd.set_detect_anomaly(True)

and it’s telling me that the error occurs at

Warning: Error detected in Log Backward torch.log(1.0 - predicted)

Since the predicted is output from torch.sigmoid, I believed the cause of error should be the isinf replace function term_b[torch.isinf(term_b)] = error_value, and I should use torch.clamp to clamp the -inf value to “error_value” instead? But if replacing -inf is a problem why would it work in the first 100 epochs?

@ptrblck thanks for the explanation. I tried replacing

term_b = torch.log(1.0 - predicted)

with

term_b = torch.clamp(term_b,-100.0,0.0)

and using your code to double check, it seems like it will still produce nan. In this case should I add a small 1e-8 inside the log term? but that would produce some difference with the value calculated with nn.BCELoss

Note that nn.BCELoss also clamps its log function outputs as described in the docs:

For one, if either y_n = or (1 - y_n) = 0, then we would be multiplying 0 with infinity. Secondly, if we have an infinite loss value, then we would also have an infinite term in our gradient, since […]
Our solution is that BCELoss clamps its log function outputs to be greater than or equal to -100. This way, we can always have a finite loss value and a linear backward method.

I tested the clamp version and so far there isn’t any error (at 300 epochs ++ now). Seems like the clamp version works? I guess I will have to implement the eps version if there’s something wrong with the clamp version. Thanks for the help