Will function torch.tanh() cause backpropagated grad be nan?

Deng_Tony · September 10, 2019, 2:06am

Hi,

Thank for your reply, and is there possible that if tanh_params is large and the (input - torch.mean(input) block is very close to zero, then it will return a very large gradient and finally cause gradient explosion to make the situation I came across?

Is this comment stated a good way to check and debug my problem, [How to check for vanishing/exploding gradients]