Laplace loss: nan issue involving two output feature values of the CNN

I use a Laplace loss function in keypoint detection, here is the loss:

def laplace(norm, logb):
    """Element-wise laplace loss"""
    out = logb + torch.mul(norm, torch.exp(-logb))  # + 0.693147
    return out

The norm is the distance of A feature map with the Groudtruth, and the logb is the inferred value in B feature map.
The forward process is OK and the loss is finite. But the nan issue happens when I did optimizer.zero_grad() and the training totally broke.

Could someone help me?

If the invalid values are created during the backward pass, you could add torch.autograd.set_detect_anomaly(True) at the beginning of your script, which should yield a stack trace pointing to the operation in the backward pass, which caused the NaNs.