What's the difference between Sigmoid+BCELoss and BCEWithLogitsLoss


(Mata Fu) #1

When I use nn.Sigmoid()+nn.BCELoss(size_average=False), everything is ok.

But when I use nn.BCEWithLogitsLoss(size_average=False), I have the following error:

Traceback (most recent call last):
File “C:\Program Files\JetBrains\PyCharm 2017.3.3\helpers\pydev\pydevd.py”, line 1668, in
main()
File “C:\Program Files\JetBrains\PyCharm 2017.3.3\helpers\pydev\pydevd.py”, line 1662, in main
globals = debugger.run(setup[‘file’], None, None, is_module)
File “C:\Program Files\JetBrains\PyCharm 2017.3.3\helpers\pydev\pydevd.py”, line 1072, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File “C:\Program Files\JetBrains\PyCharm 2017.3.3\helpers\pydev_pydev_imps_pydev_execfile.py”, line 18, in execfile
exec(compile(contents+"\n", file, ‘exec’), glob, loc)
File “E:/pytorch_project/pytorch-cfdnet/main_cfd.py”, line 349, in
main()
File “E:/pytorch_project/pytorch-cfdnet/main_cfd.py”, line 305, in main
train_error = train.forward()
File “E:/pytorch_project/pytorch-cfdnet\train.py”, line 72, in forward
loss.backward()
File “C:\Users\18mat\Anaconda3\lib\site-packages\torch\autograd\variable.py”, line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File “C:\Users\18mat\Anaconda3\lib\site-packages\torch\autograd_init_.py”, line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

Process finished with exit code 1

my training code

    for batch_idx, (x, yt) in enumerate(self.data_loader):
        x = x.cuda(async=True)
        yt = yt.cuda(async=True)
        input_var = Variable(x)
        target_var = Variable(yt)
        y = self.model(input_var)
        loss = self.criterion(y, target_var)

        if self.vis:
            self.vis.line(
                    X=torch.ones((1, 1)).cpu() * self.iterations,
                    Y=torch.Tensor([loss.data[0]]).unsqueeze(0).cpu(),
                    # Y=new_loss.cpu(),
                    win=self.loss_window,
                    update='append')

        # measure accuracy and record loss
        #
        total_loss += new_loss.cuda()
        
        # compute gradient and do SGD step
        self.optimizer.zero_grad()
        loss.backward()

        self.optimizer.step()

        if batch_idx % 10 == 0:
            if (batch_idx*len(x) + 10*len(x)) <= len(self.data_loader.dataset):
                pbar.update(10 * len(x))
            else:
                pbar.update(len(self.data_loader.dataset) - batch_idx*len(x))

        self.iterations += 1

    pbar.close()

(Emanuel Jöbstl) #2

Some code seems to be missing from your example. For instance, the variable new_loss is not initialized in your snipped.

For a start, you can try to remove all code not directly related to backward/forward passes. That will definitively make debugging easier.


(dohwan.lee) #3

BCELoss is built on top of sigmoid layer, which is numerically unstable.(by yzgao)
So, it is necessary to add similar loss modules to address this numerical stability issue. (by yzgao)