nn.CrossEntropyLoss's Gradient is Zeroed out by My customized loss

Hi, Pytorcher:

I used nn.Module to write my own loss, which is combined with nn.CrossEntropyLoss.

class MyLoss(nn.Module):

def __init__(self):

def forward(self, input):
        Input: tensor from a network
   loss = Compute(input)

   Here has some numerical issue, loss should be greater than 0, but may get loss <0
    return (torch.sqrt(loss) if loss.item()>0 else 0) 

mymodel = Net()
myloss = MyLoss()
ce = CrossEntropyLoss()
total_loss = myloss + ce

When MyLoss returns 0. The optimizer should backpropagate on nn.CrossEntropyLoss. But it turns out that the gradient is zero. The problem might be a constant return. But cross-entropy should have gradient.

Does anyone come across this type of problem?


If you don’t initialize the parameters in the network, you’re likely to have a gradient problem

Your code snippet should work, even if you return a zero in your custom loss function as seen here:

output = torch.randn(10, 10, requires_grad=True)
target = torch.randint(0, 10, (10,))

criterion = nn.CrossEntropyLoss()

loss = 0 + criterion(output, target)

Could you check the value of ce before calling total_loss.backward()?

Thanks for the reply. I am double-checking the numerical issue. Currently, it seems like tensor broadcasting introduces a non-negligible error, an entry is expected to be zero, but we got something like 7E-7.

The loss should be greater than or equal to 0, but sometimes got a very small negative value. I haven’t figured out the reason.