I have a neural network representing the policy for an RL agent.

Sometimes the training process fails with `nan`

values in the model parameters.

I realized that it happens when the policy is iterated over a very similar `loc`

for `Normal`

when the `scale`

is already very small. I have reproduced the error below.

- It happens when the training process is iterated even after the values for
`loc`

and`scale`

are well converged. Is there any way to detect this and stop the iteration? But things could get complicated if we have an array of loc and scale. - Is
`sigmoid`

a good choice for an activation function whose output goes as`scale`

for`Normal`

?

```
import torch as th
loc = th.tensor(0.5, requires_grad=True)
scale = th.tensor(-45., requires_grad=True)
optimizer = th.optim.Adam((loc, scale))
for i in range(10):
N = th.distributions.Normal(loc, scale.sigmoid())
loss = -N.log_prob(th.tensor(0.5))
optimizer.zero_grad()
loss.backward()
optimizer.step()
```