I have a neural network representing the policy for an RL agent.
Sometimes the training process fails with nan
values in the model parameters.
I realized that it happens when the policy is iterated over a very similar loc
for Normal
when the scale
is already very small. I have reproduced the error below.
- It happens when the training process is iterated even after the values for
loc
andscale
are well converged. Is there any way to detect this and stop the iteration? But things could get complicated if we have an array of loc and scale. - Is
sigmoid
a good choice for an activation function whose output goes asscale
forNormal
?
import torch as th
loc = th.tensor(0.5, requires_grad=True)
scale = th.tensor(-45., requires_grad=True)
optimizer = th.optim.Adam((loc, scale))
for i in range(10):
N = th.distributions.Normal(loc, scale.sigmoid())
loss = -N.log_prob(th.tensor(0.5))
optimizer.zero_grad()
loss.backward()
optimizer.step()