If I have a loss function is the form torch.log(-B*torch.exp(X))
what should be the best way to tackle the torch.exp
and torch.log
from getting nan.
I am assuming X is a real tensor…
What is the value of B? If it is positive, you will get nan…
However, if it is negative, you can do
torch.log(torch.tensor(-B)) + X # since torch.log(torch.exp(X)) => X
to avoid really high values in exp…
the actual computation is log \mathbf{E} \Big[ -B*torch.exp(X) \Big] = torch.log( torch.mean ( -B*torch.exp(X) ))
.
And yes both B
and X
are tensors and output of two different neural networks. ex: B = NN_1(b)+some_additional_calculation
and X = NN_2(x)+some_additional_calculation
logsumexp exists to tackle this case using identity:
log(exp(a)+exp(b)) = c + log(exp(a-c) + exp(b-c))
c=max(a,b)
You can adapt this for scaling and mean with: K*exp(a) = exp(log(K))*exp(a) = exp(a+log(K))
Or just use .clamp() on problematic tensors.
1 Like
Thanks a lot for pointing that out!