Categorical distribution returning breaking

Based on the error message it seems the actor is creating NaN outputs after a few iterations of training. Are you seeing an increase in the value range of its output during training, which could then overflow after a while?