I use a Laplace loss function in keypoint detection, here is the loss:
def laplace(norm, logb): """Element-wise laplace loss""" out = logb + torch.mul(norm, torch.exp(-logb)) # + 0.693147 return out
The norm is the distance of A feature map with the Groudtruth, and the logb is the inferred value in B feature map.
The forward process is OK and the loss is finite. But the nan issue happens when I did
optimizer.zero_grad() and the training totally broke.