What's the difference between Sigmoid+BCELoss and BCEWithLogitsLoss

Mata_Fu · February 13, 2018, 5:29pm

When I use nn.Sigmoid()+nn.BCELoss(size_average=False), everything is ok.

But when I use nn.BCEWithLogitsLoss(size_average=False), I have the following error:

Traceback (most recent call last):
File “C:\Program Files\JetBrains\PyCharm 2017.3.3\helpers\pydev\pydevd.py”, line 1668, in
main()
File “C:\Program Files\JetBrains\PyCharm 2017.3.3\helpers\pydev\pydevd.py”, line 1662, in main
globals = debugger.run(setup[‘file’], None, None, is_module)
File “C:\Program Files\JetBrains\PyCharm 2017.3.3\helpers\pydev\pydevd.py”, line 1072, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File “C:\Program Files\JetBrains\PyCharm 2017.3.3\helpers\pydev_pydev_imps_pydev_execfile.py”, line 18, in execfile
exec(compile(contents+“\n”, file, ‘exec’), glob, loc)
File “E:/pytorch_project/pytorch-cfdnet/main_cfd.py”, line 349, in
main()
File “E:/pytorch_project/pytorch-cfdnet/main_cfd.py”, line 305, in main
train_error = train.forward()
File “E:/pytorch_project/pytorch-cfdnet\train.py”, line 72, in forward
loss.backward()
File “C:\Users\18mat\Anaconda3\lib\site-packages\torch\autograd\variable.py”, line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File “C:\Users\18mat\Anaconda3\lib\site-packages\torch\autograd_init_.py”, line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

Process finished with exit code 1

my training code

    for batch_idx, (x, yt) in enumerate(self.data_loader):
        x = x.cuda(async=True)
        yt = yt.cuda(async=True)
        input_var = Variable(x)
        target_var = Variable(yt)
        y = self.model(input_var)
        loss = self.criterion(y, target_var)

        if self.vis:
            self.vis.line(
                    X=torch.ones((1, 1)).cpu() * self.iterations,
                    Y=torch.Tensor([loss.data[0]]).unsqueeze(0).cpu(),
                    # Y=new_loss.cpu(),
                    win=self.loss_window,
                    update='append')

        # measure accuracy and record loss
        #
        total_loss += new_loss.cuda()
        
        # compute gradient and do SGD step
        self.optimizer.zero_grad()
        loss.backward()

        self.optimizer.step()

        if batch_idx % 10 == 0:
            if (batch_idx*len(x) + 10*len(x)) <= len(self.data_loader.dataset):
                pbar.update(10 * len(x))
            else:
                pbar.update(len(self.data_loader.dataset) - batch_idx*len(x))

        self.iterations += 1

    pbar.close()

ejoebstl · February 13, 2018, 7:16pm

Some code seems to be missing from your example. For instance, the variable new_loss is not initialized in your snipped.

For a start, you can try to remove all code not directly related to backward/forward passes. That will definitively make debugging easier.

dohwan.lee · March 15, 2018, 6:40am

BCELoss is built on top of sigmoid layer, which is numerically unstable.(by yzgao)
So, it is necessary to add similar loss modules to address this numerical stability issue. (by yzgao)