I met a ‘nan’ loss problem because of introducing a torch.log(t) operation in the forward pass. when I removed the log operation, things work fine. I think it’s because of the tensor t contains very small or zero number. does that mean if the forward process produces some ‘nan’ numbers, the loss will must be ‘nan’ number. e.g. if I put a ‘nan’ tensor into a nn.linear layer, what will be the output of this linear layer?

However, if I do the log operation using numpy and turn the data being logged to a tensor, things still work fine, no nan error in the loss.

I’d like to know how the extremely small numbers are treated in back propagation in Pytorch. And what is the grad of a ‘nan’ loss.

Thank you so much!