What's the difference between Sigmoid+BCELoss and BCEWithLogitsLoss

When I use nn.Sigmoid()+nn.BCELoss(size_average=False), everything is ok.

But when I use nn.BCEWithLogitsLoss(size_average=False), I have the following error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

my training code

    for batch_idx, (x, yt) in enumerate(self.data_loader):
        x = x.cuda(async=True)
        yt = yt.cuda(async=True)
        input_var = Variable(x)
        target_var = Variable(yt)
        y = self.model(input_var)
        loss = self.criterion(y, target_var)

                    # Y=new_loss.cpu(),

Some code seems to be missing from your example. For instance, the variable new_loss is not initialized in your snipped.

For a start, you can try to remove all code not directly related to backward/forward passes. That will definitively make debugging easier.

BCELoss is built on top of sigmoid layer, which is numerically unstable.(by yzgao)
So, it is necessary to add similar loss modules to address this numerical stability issue. (by yzgao)