How to solve the loss become nan because of using torch.log()

crissallan · August 27, 2019, 10:13pm

I met a ‘nan’ loss problem because of introducing a torch.log(t) operation in the forward pass. when I removed the log operation, things work fine. I think it’s because of the tensor t contains very small or zero number. does that mean if the forward process produces some ‘nan’ numbers, the loss will must be ‘nan’ number. e.g. if I put a ‘nan’ tensor into a nn.linear layer, what will be the output of this linear layer?

However, if I do the log operation using numpy and turn the data being logged to a tensor, things still work fine, no nan error in the loss.

I’d like to know how the extremely small numbers are treated in back propagation in Pytorch. And what is the grad of a ‘nan’ loss.
Thank you so much!

alekhka · August 27, 2019, 11:59pm

This is very likely because the input is a negative number.

Since logarithmic function has the domain x>0, you have to ensure that the input is non-negative and non-zero.

I would use a non-linearity like ReLU or sigmoid to ensure non-negativity and then add a small ‘epsilon’ to ensure non-zero:

eps=1e-7
t = F.relu(t)
t = torch.log(t +eps)

All operations on nan result in nan, so everything will become nan quickly.