Loss not converge In DDPG

Hi @vmoens , I found that my question is that the output of the actor network is constrained to a threshold value.It causes the gradient vanishing.
How can i deal with this question?
I have changed the actorloss funtion with
actor_loss = -policy_Q.mean() + kappa_v * (torch.pow(torch.max(penalty[0] - zeta_s, 0)[0], 2) + torch.pow(torch.max(-penalty[0] - zeta_s, 0)[0], 2)) + kappa_a * (torch.pow(torch.max(penalty[1] - zeta_t, 0)[0], 2) + torch.pow(torch.max(-penalty[1] - zeta_t, 0)[0], 2))
it used the Pre-Activation Penalty to aviod the gradient vanishing,but after training,it does not works,the gradient still vanish after a few epoch.
what should i do?
hope for your reply