Calculate_gain('tanh')

So for these two:

Then that might work. My impression was that the “usual” way to counter exploding gradients was clipping. (More prominently in RNNs where tanh still is very common, too.)

I thought Klambauer et al, Self-Normalizing Neural Networks had the elaborate insights for this including gradients.

Best regards

Thomas