Custom loss function causes loss to go NaN after certain epochs?

I guess in the first epochs the output logits were not saturated and torch.sigmoid didn’t return a zero or one yet.

Replacing the invalid values after they were calculated won’t avoid computing invalid gradients:

predicted = torch.zeros(1, requires_grad=True)
term_a = torch.log(predicted)
term_a[torch.isinf(term_a)] = -100.
term_a.backward()
print(predicted.grad)
> tensor([nan])