How can the model updates weights under conditions that loss=nan in some batches?

I have a custom loss function which contains log(y) in it.
y is one of the model outputs and I don’t want to restrict y to be positive through any activation function.

So, it is possible that loss returns ‘nan’ because of log(y) for y < 0.

Somehow, my model returns ‘nan’ in some batches but it can keep on training until it converges.

截圖 2021-03-18 下午5.45.45

I wonder how the model updates its weights when the loss returns ‘nan’…?

Thank you

I don’t know how the model is updated exactly, but you could check all .grad attributes after the loss.backward() call and see, if there are nan values.
Generally, a nan loss could break your model as seen here:


x = torch.randn(10, requires_grad=True)
optimizer = torch.optim.SGD([x], lr=1.)

y = torch.log(x)
y = y * y
loss = y.mean()
> tensor(nan, grad_fn=<MeanBackward0>)

> tensor([    nan, -0.0501,     nan, -0.0420,     nan,     nan,     nan,     nan,
        -0.2245,     nan])

> tensor([   nan, 0.8653,    nan, 0.8805,    nan,    nan,    nan,    nan, 0.7679,
           nan], requires_grad=True)

Are you sure the optimizer.step() is indeed performed or does your code have specific guards to avoid it?

Thank you so much for your reply!
Your sample code gave me inspiration for the following toy model

import torch
# input
x = torch.randn(5, requires_grad=True)

# model
class simple(torch.nn.Module):
    def __init__(self):
        super(simple, self).__init__()
        self.layer = torch.nn.Linear(5, 1, bias=False)
    def forward(self, in_):
        return self.layer(in_)
model = simple()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)

y = torch.log(model(x))
loss = y

print('input x: ', x)
> input x: tensor([-0.6195,  1.2090,  0.7255,  0.3041, -1.8422], requires_grad=True)
print('layer weights: ',
> layer weights: tensor([[ 0.0566,  0.3471,  0.0256, -0.3857,  0.2482]])
print('loss: ', loss)
> loss: tensor([nan], grad_fn=<LogBackward>)

then I calculated gradients and backpropagated

print('layer weights: ',
> layer weights: tensor([[ 0.0566,  0.3471,  0.0256, -0.3857,  0.2482]])
print('layer weights grad: ', model.layer.weight.grad)
> layer weights grad: tensor([[ 3.6150, -7.0544, -4.2334, -1.7744, 10.7490]])

print('layer weights: ',
> layer weights: tensor([[ 0.0530,  0.3542,  0.0298, -0.3840,  0.2374]])

The model could obtain gradients of the layer weights and update the weights even if the loss is ‘nan’!
I think the optimizer has another strategy to update the model’s weights when receiving a ‘nan’ loss,
but I’m still looking for the answer…

Thank you so much again for your kind reply, you really saved my time!