RuntimeError: Function 'PowBackward1' returned nan values in its 1st output

I defined a loss function for my network, and it’s computed as:

    loss = pred.mm(pred.T) - (label.view(-1,1) == label.view(1,-1)).to(dtype=torch.float32)
    loss = loss**2
    loss = torch.pow(loss, (2-loss)**2)
    loss = loss.sum() / (label.size(0)**2 +1.0)    
    return loss

During training, the following error report appeared:
RuntimeError: Function 'PowBackward1' returned nan values in its 1st output.
and this error is traced to the line:
loss = torch.pow(loss, (2-loss)**2)
I’m not clear about how this error is related to this line of code. Can anyone help me out?

Hi,

What is the value of the loss when it happens?
This is most likely that you reach a value of the loss for which loss.pow((2-loss)**2) is not differentiable.

Hi, I rewrote the loss and now it’s computed as:

    loss = pred.mm(pred.T) - (label.view(-1,1) == label.view(1,-1)).to(dtype=torch.float32)
    loss = (loss**2).sum()
    loss = loss / (label.size(0)**2 +1.0)
    
    return loss

where pred is computed as:

pred = 1.0 / (1.0 + (-pred).exp())

however, another error occured:

RuntimeError: Function 'ExpBackward' returned nan values in its 0th output.

which is traced to the ‘.exp’ operation.
The loss value is 0.980656 when this happened. Can you figure this out?

The error disappeared after I reduced the initial learning rate by half. But I’m still not clear about the mechanism behind.

In that case, what most likely happens is that the value of pred goes to infinite and thus leads to nan in gradients.

Hello @Hawk ,
You mentioned that “error is traced to this line”. Can you share how you traced it back?