I’m training a model to predict landmarks on faces. My lossfunction looks like in the following:
"
logits = model_ft(inputs)
out=torch.sigmoid(logits)
loss_temp=(torch.abs(out-target))**potenz
loss_temp[torch.isnan(target)]=0
loss=torch.mean(loss_temp)
loss.backward()
"
Not all Landmarks are everytime provided, so thats the reason I assign the loss a zero for empty Landmarks. When I use the potenz=1 it works fine, but if I change it to potenz=2 or every other value like potenz=1.0001 than the gradients of the weights of the model getting nan after the first backward pass.
I tried very small learning rates and exponents like 0.0001,0.9999,1.0001. It’s only nan if I choose a different value than 0 or 1. I also checked out the gradients after the first backward pass. Therefore I don’t think the loss is exploding.