tom
(Thomas V)
July 8, 2018, 6:11pm
5
So for these two:
jpeg729:
tanh
seems stable with pretty much any gain > 1
With gain 5/3 the output stabilises at ~.65, but the gradients start to explode after around 10 layers
Gain 1.1 works much better, giving output std stable around 0.30 and grads that are much more stable though they do grow slowly
Then that might work. My impression was that the “usual” way to counter exploding gradients was clipping. (More prominently in RNNs where tanh still is very common, too.)
jpeg729:
selu
only works with gain 1 and gives output std ~= 1.00, but the grad slowly explodes after 10-20 layers
I thought Klambauer et al, Self-Normalizing Neural Networks had the elaborate insights for this including gradients.
Best regards
Thomas