Normal(): optimization step brings in nan into the model

I have a neural network representing the policy for an RL agent.
Sometimes the training process fails with nan values in the model parameters.

I realized that it happens when the policy is iterated over a very similar loc for Normal when the scale is already very small. I have reproduced the error below.

  1. It happens when the training process is iterated even after the values for loc and scale are well converged. Is there any way to detect this and stop the iteration? But things could get complicated if we have an array of loc and scale.
  2. Is sigmoid a good choice for an activation function whose output goes as scale for Normal ?
import torch as th

loc = th.tensor(0.5, requires_grad=True)
scale = th.tensor(-45., requires_grad=True)

optimizer = th.optim.Adam((loc, scale))

for i in range(10):
    N = th.distributions.Normal(loc, scale.sigmoid())
    loss = -N.log_prob(th.tensor(0.5))
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()