Doing predictions of 3d trajectories, I am using a LSTM to output the parameters of a distribution (3d gaussian). Then, computing the loss :
-dist.log_prob(ground_truth).mean(), to maximize the likelihood of the ground truth values.
log(prob), I would like the method to compute
log(prob + espilon). Is it possible ?
The issue is :
log_prob worked on a test dataset, but on the real, noised dataset, the loss explodes after a while. I can see in some trajectories random points that are suddenly really far from the previous and next points. This likely causes some
prob to be infinitesimal, which causes some
-log_prob() to be really big, which causes the grads to explode.
log(prob + epsilon) would solve my problem, but the method
.log_prob() does not propose this.
PS : Another solution would be to clip the gradient, but as discussed in this github issue, it is not (yet) doable when using
nn.LSTM… only when using
log_prob(prob + eps) would kind of “limit” the log to small values, and most importantly limit the grads to small values too.