Hi,
Doing predictions of 3d trajectories, I am using a LSTM to output the parameters of a distribution (3d gaussian). Then, computing the loss : -dist.log_prob(ground_truth).mean()
, to maximize the likelihood of the ground truth values.
Instead of log(prob)
, I would like the method to compute log(prob + espilon)
. Is it possible ?
The issue is : log_prob
worked on a test dataset, but on the real, noised dataset, the loss explodes after a while. I can see in some trajectories random points that are suddenly really far from the previous and next points. This likely causes some prob
to be infinitesimal, which causes some -log_prob()
to be really big, which causes the grads to explode.
I think log(prob + epsilon)
would solve my problem, but the method .log_prob()
does not propose this.
PS : Another solution would be to clip the gradient, but as discussed in this github issue, it is not (yet) doable when using nn.LSTM
… only when using nn.LSTMCell
.
=> log_prob(prob + eps)
would kind of “limit” the log to small values, and most importantly limit the grads to small values too.