What's the difference between Sigmoid+BCELoss and BCEWithLogitsLoss

(Mata Fu) #1

When I use nn.Sigmoid()+nn.BCELoss(size_average=False), everything is ok.

But when I use nn.BCEWithLogitsLoss(size_average=False), I have the following error:

Traceback (most recent call last):
File “C:\Program Files\JetBrains\PyCharm 2017.3.3\helpers\pydev\pydevd.py”, line 1668, in
File “C:\Program Files\JetBrains\PyCharm 2017.3.3\helpers\pydev\pydevd.py”, line 1662, in main
globals = debugger.run(setup[‘file’], None, None, is_module)
File “C:\Program Files\JetBrains\PyCharm 2017.3.3\helpers\pydev\pydevd.py”, line 1072, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File “C:\Program Files\JetBrains\PyCharm 2017.3.3\helpers\pydev_pydev_imps_pydev_execfile.py”, line 18, in execfile
exec(compile(contents+"\n", file, ‘exec’), glob, loc)
File “E:/pytorch_project/pytorch-cfdnet/main_cfd.py”, line 349, in
File “E:/pytorch_project/pytorch-cfdnet/main_cfd.py”, line 305, in main
train_error = train.forward()
File “E:/pytorch_project/pytorch-cfdnet\train.py”, line 72, in forward
File “C:\Users\18mat\Anaconda3\lib\site-packages\torch\autograd\variable.py”, line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File “C:\Users\18mat\Anaconda3\lib\site-packages\torch\autograd_init_.py”, line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

Process finished with exit code 1

my training code

    for batch_idx, (x, yt) in enumerate(self.data_loader):
        x = x.cuda(async=True)
        yt = yt.cuda(async=True)
        input_var = Variable(x)
        target_var = Variable(yt)
        y = self.model(input_var)
        loss = self.criterion(y, target_var)

        if self.vis:
                    X=torch.ones((1, 1)).cpu() * self.iterations,
                    # Y=new_loss.cpu(),

        # measure accuracy and record loss
        total_loss += new_loss.cuda()
        # compute gradient and do SGD step


        if batch_idx % 10 == 0:
            if (batch_idx*len(x) + 10*len(x)) <= len(self.data_loader.dataset):
                pbar.update(10 * len(x))
                pbar.update(len(self.data_loader.dataset) - batch_idx*len(x))

        self.iterations += 1


(Emanuel Jöbstl) #2

Some code seems to be missing from your example. For instance, the variable new_loss is not initialized in your snipped.

For a start, you can try to remove all code not directly related to backward/forward passes. That will definitively make debugging easier.

(dohwan.lee) #3

BCELoss is built on top of sigmoid layer, which is numerically unstable.(by yzgao)
So, it is necessary to add similar loss modules to address this numerical stability issue. (by yzgao)