I have a neural network representing the policy for an RL agent.
Sometimes the training process fails with
nan values in the model parameters.
I realized that it happens when the policy is iterated over a very similar
Normal when the
scale is already very small. I have reproduced the error below.
- It happens when the training process is iterated even after the values for
scaleare well converged. Is there any way to detect this and stop the iteration? But things could get complicated if we have an array of loc and scale.
sigmoida good choice for an activation function whose output goes as
import torch as th loc = th.tensor(0.5, requires_grad=True) scale = th.tensor(-45., requires_grad=True) optimizer = th.optim.Adam((loc, scale)) for i in range(10): N = th.distributions.Normal(loc, scale.sigmoid()) loss = -N.log_prob(th.tensor(0.5)) optimizer.zero_grad() loss.backward() optimizer.step()