I use a Laplace loss function in keypoint detection, here is the loss:
def laplace(norm, logb):
"""Element-wise laplace loss"""
out = logb + torch.mul(norm, torch.exp(-logb)) # + 0.693147
return out
The norm is the distance of A feature map with the Groudtruth, and the logb is the inferred value in B feature map.
The forward process is OK and the loss is finite. But the nan issue happens when I did optimizer.zero_grad()
and the training totally broke.