Weights and outputs become nan

Nemfor · May 12, 2022, 10:26pm

Hi! I face the following problem:
I try to use custom loss (RMSLE)

class RMSLELoss(nn.Module):
    def __init__(self):
        super().__init__()
        self.mse = nn.MSELoss(reduction='none')
    def forward(self, pred, actual):
        return torch.sqrt(self.mse(torch.log(pred + 1), torch.log(actual + 1)))

during training I use loss.mean() to average the results

output = self._model(batch).squeeze(dim=1)
target = batch['target'].to(self._device)
loss = loss_func(output, target)
loss = loss.mean()

So i just average the loss by myself, but after some iterations weights and outputs become nans.
But if i replace in init of RMSLE class to just MSELoss() with reduction as default- so it avarages by itself, all problems will go away. Why it happens?
If i use just MSELoss(), not my custom, the problem doesnt show again.

AlphaBetaGamma96 · May 12, 2022, 10:56pm

Are your pred and actual Tensors strictly positive? the Log of a negative number is undefined and will lead to a nan.

Nemfor · May 13, 2022, 6:50am

I can chech it but the question is why the are becoming probably negative only in the case when I avarage my loss by myself?
And as I sad, the weights of the layears are always become to nans

eqy · May 13, 2022, 7:01am

I’m not sure the expressions are identical as the first computes the mean inside the sqrt while the second does it on the outside so they could have different numerical properties.
e.g.,
d/dx((sqrt(x) + sqrt(y))/2) is 1/(4sqrt(x))
whereas
d/dx(sqrt((x + y )/ 2)) is 1/(2sqrt(2)sqrt(x+y)）