Will function torch.tanh() cause backpropagated grad be nan?


I am working on learning an image filter, but while the training process, after some iterations, the loss and all learned parameters became nan values.

Here is the main part of my code:

class net(nn.Module): 
        def __init__(self):
        super(net, self).__init__()
        self.encoder = models.resnet18(pretrained=True)
        self.fc1 = nn.Linear(1000, 256)
        self.fc2 = nn.Linear(256, 128)
        self.tanh_params = nn.Linear(128, 1)
        self.relu = nn.ReLU()
    def forward(self, input):
        input = self.encoder(input)
        input = self.relu(self.fc2(self.relu(self.fc1(input))))
        tanh_params = torch.exp(self.tanh_params(input))
        image = torch.tanh(tanh_params * (input - torch.mean(input))
        image = image * input
        return image

The code seems well but the result is unsatisified, are there something wrong or missed in this code block, or maybe this error is caused by my datasets?

Thank you to whom reading this problem and want to give me a hand.


No tanh cannot return nans as it’s gradient is well defined everywhere.
If this happens after some iterations, you should make sure your loss is well behaved and is not just diverging to very very large values until it gets nan.


Thank for your reply, and is there possible that if tanh_params is large and the (input - torch.mean(input) block is very close to zero, then it will return a very large gradient and finally cause gradient explosion to make the situation I came across?

Is this comment stated a good way to check and debug my problem, [How to check for vanishing/exploding gradients]


If I’m not mistaken, the gradient for tanh is never large right? It goes from 0 to 1?

I would recommend to print some values first to make sure that they become nan become they are too big.