BCE loss giving very small negative values

Mihai_Calugar · April 14, 2019, 9:30am

I’m trying to train a model based on segmentation using UNet architecture . I am using code similar to https://github.com/milesial/Pytorch-UNet for training and using the dataset for Carvana Masking Challenge ( specifically the one mentioned on github) the network works just fine and obtain losses of maximum 0.7.

My dataset contains biomedical images of breast scans and corresponding masks, but whenever i run it losses corresponding to each batch give me very small negative values and i can’t figure out why.

import os
import sys
import time
from optparse import OptionParser

import numpy as np
import torch
import torch.nn as nn
from torch import optim

from unet import UNet
from utils import get_ids, split_ids, split_train_val, get_imgs_and_masks, batch


def train_net(net,
              epochs=5,
              batch_size=1,
              lr=0.1,
              val_percent=0.05,
              save_cp=True,
              gpu=True,
              img_scale=0.5):

    dir_img = 'E:/Mihaica/Faculta/An4/Licenta/TrainingIncercare/256-jpg-calcifieri/256-jpg/roi_256_24bp/'
    dir_mask = 'E:/Mihaica/Faculta/An4/Licenta/TrainingIncercare/256-jpg-calcifieri/256-jpg/mask_256/'

    dir_checkpoint = 'checkpoints/'

    ids = get_ids(dir_img)
    ids = split_ids(ids)

    iddataset = split_train_val(ids, val_percent)

    print('''
    Starting training:
        Epochs: {}
        Batch size: {}
        Learning rate: {}
        Training size: {}
        Validation size: {}
        Checkpoints: {}
        CUDA: {}
    '''.format(epochs, batch_size, lr, len(iddataset['train']),
               len(iddataset['val']), str(save_cp), str(gpu)))

    N_train = len(iddataset['train'])

    optimizer = optim.SGD(net.parameters(),
                          lr=lr,
                          momentum=0.9,
                          weight_decay=0.0005)

    criterion = nn.BCELoss()

    # train = get_imgs_and_masks(iddataset['train'], dir_img, dir_mask, img_scale)
    # print(train[0])

    for epoch in range(epochs):
        print('Starting epoch {}/{}.'.format(epoch + 1, epochs))
        net.train()

        # reset the generators
        train = get_imgs_and_masks(iddataset['train'], dir_img, dir_mask, img_scale)
        # val = get_imgs_and_masks(iddataset['val'], dir_img, dir_mask, img_scale)

        epoch_loss = 0

        for i, b in enumerate(batch(train, batch_size)):
            imgs = np.array([i[0] for i in b])
            # imgs = np.array([i[0] for i in b]).astype(np.float32)
            true_masks = np.array([i[1] for i in b])

            imgs = torch.from_numpy(imgs)
            true_masks = torch.from_numpy(true_masks)

            if gpu:
                imgs = imgs.cuda()
                true_masks = true_masks.cuda()

            masks_pred = net(imgs)
            masks_probs_flat = masks_pred.view(-1)

            true_masks_flat = true_masks.view(-1)

            loss = criterion(masks_probs_flat, true_masks_flat)
            epoch_loss += loss.item()

            print('{0:.4f} --- loss: {1:.6f}'.format(i * batch_size / N_train, loss.item()))

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        print('Epoch finished ! Loss: {}'.format(epoch_loss / i))

        # if 1:
        #     val_dice = eval_net(net, val, gpu)
        #     print('Validation Dice Coeff: {}'.format(val_dice))

        if save_cp:
            torch.save(net.state_dict(),
                       dir_checkpoint + 'CP{}.pth'.format(epoch + 1))
            print('Checkpoint {} saved !'.format(epoch + 1))

Utils for loading and preprocessing:

import os
from PIL import Image
from .utils import resize_and_crop, get_square, normalize, hwc_to_chw


def get_ids(dir):
    return (f[:-4] for f in os.listdir(dir))


def split_ids(ids, n=2):
    # Split each id in n, creating n tuples (id, k) for each id
    return ((id, i)  for id in ids for i in range(n))


def to_cropped_imgs(ids, dir, suffix, scale):
    """From a list of tuples, returns the correct cropped img"""
    for id1, pos in ids:
        id = id1.replace(".", "")
        im = resize_and_crop(Image.open(dir + id + suffix), scale=scale)
        yield get_square(im, pos)


def get_imgs_and_masks(ids, dir_img, dir_mask, scale):
    """Return all the couples (img, mask)"""

    imgs = to_cropped_imgs(ids, dir_img, '.jpg', scale)

    # need to transform from HWC to CHW
    imgs_switched = map(hwc_to_chw, imgs)
    imgs_normalized = map(normalize, imgs_switched)

    masks = to_cropped_imgs(ids, dir_mask, '_mask.jpg', scale)

    return zip(imgs_normalized, masks)

Any sort of help is much appreciated. Thanks

nunenuh · May 29, 2019, 12:07am

I have same problem with new version of pytorch, before version 1.x I was train unet with no problem at all in loss functioin using BCELoss or DiceLoss, but when I train again the loss can result in negative result when using BCELoss or BCE Loss with Logits? anyone has a solution this problem?

ptrblck · May 29, 2019, 10:35am

Do you have a code snippet to reproduce this issue?

nunenuh · May 29, 2019, 12:23pm

class BaseConv(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, padding,
                 stride):
        super(BaseConv, self).__init__()

        self.act = nn.ReLU()

        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size, padding,
                               stride)

        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size,
                               padding, stride)

    def forward(self, x):
        x = self.act(self.conv1(x))
        x = self.act(self.conv2(x))
        return x


class DownConv(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, padding,
                 stride):
        super(DownConv, self).__init__()

        self.pool1 = nn.MaxPool2d(kernel_size=2)
        self.conv_block = BaseConv(in_channels, out_channels, kernel_size,
                                   padding, stride)

    def forward(self, x):
        x = self.pool1(x)
        x = self.conv_block(x)
        return x


class UpConv(nn.Module):
    def __init__(self, in_channels, in_channels_skip, out_channels,
                 kernel_size, padding, stride):
        super(UpConv, self).__init__()

        self.conv_trans1 = nn.ConvTranspose2d(
            in_channels, in_channels, kernel_size=2, padding=0, stride=2)
        self.conv_block = BaseConv(
            in_channels=in_channels + in_channels_skip,
            out_channels=out_channels,
            kernel_size=kernel_size,
            padding=padding,
            stride=stride)

    def forward(self, x, x_skip):
        x = self.conv_trans1(x)
        x = torch.cat((x, x_skip), dim=1)
        x = self.conv_block(x)
        return x


class UNet(nn.Module):
    def __init__(self, in_channels, out_channels, n_class, kernel_size,
                 padding, stride):
        super(UNet, self).__init__()

        self.init_conv = BaseConv(in_channels, out_channels, kernel_size,
                                  padding, stride)

        self.down1 = DownConv(out_channels, 2 * out_channels, kernel_size,
                              padding, stride)

        self.down2 = DownConv(2 * out_channels, 4 * out_channels, kernel_size,
                              padding, stride)

        self.down3 = DownConv(4 * out_channels, 8 * out_channels, kernel_size,
                              padding, stride)

        self.up3 = UpConv(8 * out_channels, 4 * out_channels, 4 * out_channels,
                          kernel_size, padding, stride)

        self.up2 = UpConv(4 * out_channels, 2 * out_channels, 2 * out_channels,
                          kernel_size, padding, stride)

        self.up1 = UpConv(2 * out_channels, out_channels, out_channels,
                          kernel_size, padding, stride)

        self.out = nn.Conv2d(out_channels, n_class, kernel_size, padding, stride)

    def forward(self, x):
        # Encoder
        x = self.init_conv(x)
        x1 = self.down1(x)
        x2 = self.down2(x1)
        x3 = self.down3(x2)
        # Decoder
        x_up = self.up3(x3, x2)
        x_up = self.up2(x_up, x1)
        x_up = self.up1(x_up, x)
        x_out = F.sigmoid(self.out(x_up))
        return x_out

model = UNet(in_channels=1,
             out_channels=32,
             n_class=1,
             kernel_size=3,
             padding=1,
             stride=1)
# model = UNet(n_channels=1, n_classes=1)
model = model.to(device)

optimizer = optim.SGD(model.parameters(), lr=0.001)
criterion = nn.BCELoss()

for epoch in range(10):
    model.train()
    for idx, (feat, targ) in enumerate(data.train_loader):
        feat = feat.to(device)
        targ = targ.to(device)

        optimizer.zero_grad()
        pred = model(feat)
        loss = criterion(pred, targ)
        loss.backward()
        optimizer.step()
        
        print(f'Epoch {epoch} Train Loss {idx}: {loss.item()}')
    
    with torch.no_grad():
        model.eval()
        for idx, (feat, targ) in enumerate(data.valid_loader):
            feat = feat.to(device)
            targ = targ.to(device)

            pred = model(feat)
            loss = criterion(pred, targ)

            print(f'Epoch {epoch} Valid Loss {idx}: {loss.item()}')

Epoch 1 Train Loss 37: -2.0091397762298584
Epoch 1 Train Loss 38: -1.8624688386917114
Epoch 1 Train Loss 39: -2.1319470405578613
Epoch 1 Train Loss 40: -2.592771053314209
Epoch 1 Train Loss 41: -3.217898368835449
Epoch 1 Train Loss 42: -3.0293173789978027
Epoch 1 Train Loss 43: -3.380110025405884
Epoch 1 Train Loss 44: -4.6700944900512695
Epoch 1 Valid Loss 0: -8.985795021057129
Epoch 1 Valid Loss 1: -9.246602058410645
Epoch 1 Valid Loss 2: -9.319241523742676
Epoch 1 Valid Loss 3: -9.418668746948242
Epoch 1 Valid Loss 4: -9.03136157989502
Epoch 1 Valid Loss 5: -9.369157791137695
Epoch 1 Valid Loss 6: -9.51211929321289
Epoch 1 Valid Loss 7: -9.425999641418457
Epoch 1 Valid Loss 8: -9.599837303161621
Epoch 1 Valid Loss 9: -9.633952140808105
Epoch 1 Valid Loss 10: -9.373023986816406
Epoch 1 Valid Loss 11: -9.418418884277344
Epoch 1 Valid Loss 12: -9.380928993225098
Epoch 1 Valid Loss 13: -9.855154991149902
Epoch 1 Valid Loss 14: -9.249217987060547

ptrblck · May 29, 2019, 12:28pm

I cannot reproduce this issue using random data:

for _ in range(100):
    feat = torch.randn(1, 1, 224, 224).to(device)
    targ = torch.randint(0, 2, (1, 1, 224, 224)).float().to(device)
    optimizer.zero_grad()
    pred = model(feat)
    loss = criterion(pred, targ)
    print(loss.item())
    loss.backward()
    optimizer.step()

Each loss is a positive number.
Could you post an input and target sample which yields this result?

nunenuh · May 29, 2019, 12:34pm

Have you try with mnist dataset and try to do autoencoder training?

ptrblck · May 29, 2019, 12:42pm

Yes. It’s still working fine, if the target values are in [0, 1].

nunenuh · May 29, 2019, 1:18pm

I was use MNIST dataset and this is the result
Epoch 0 Train Loss 149: 0.14252015948295593
Epoch 0 Train Loss 150: 0.13949544727802277
Epoch 0 Train Loss 151: 0.14344342052936554
Epoch 0 Train Loss 152: 0.1380266398191452
Epoch 0 Train Loss 153: 0.08092612773180008
Epoch 0 Train Loss 154: 0.04462871327996254
Epoch 0 Train Loss 155: 0.026371218264102936
Epoch 0 Train Loss 156: 0.00028531771386042237
Epoch 0 Train Loss 157: 0.006431366316974163
Epoch 0 Train Loss 158: 0.007001318968832493
Epoch 0 Train Loss 159: -0.01587434858083725
Epoch 0 Train Loss 160: -0.0454784594476223
Epoch 0 Train Loss 161: -0.04419191554188728
Epoch 0 Train Loss 162: -0.06265364587306976
Epoch 0 Train Loss 163: -0.1070045754313469
Epoch 0 Train Loss 164: -0.10746034979820251
Epoch 0 Train Loss 165: -0.18705613911151886
Epoch 0 Train Loss 166: -0.1402345448732376
Epoch 0 Train Loss 167: -0.14482370018959045
Epoch 0 Train Loss 168: -0.21205617487430573
Epoch 0 Train Loss 169: -0.298073947429657
Epoch 0 Train Loss 170: -0.2425336241722107
Epoch 0 Train Loss 171: -0.3488599359989166
Epoch 0 Train Loss 172: -0.4062669575214386
Epoch 0 Train Loss 173: -0.4244926869869232
Epoch 0 Train Loss 174: -0.45135900378227234
Epoch 0 Train Loss 175: -0.49181121587753296
Epoch 0 Train Loss 176: -0.5699509382247925

ptrblck · May 29, 2019, 1:21pm

Could you check the target range?
If you pass negative values as the target, you’ll get negative values:

F.binary_cross_entropy(torch.tensor(0.), torch.tensor(-1.))

nunenuh · May 29, 2019, 2:35pm

okay, I think I found the problem, my normalization returning -1. Thanks your for your reponse.

Kartik_Bhartiya · June 13, 2019, 6:35am

My normalisation is also returning -1 and so I am getting a negative loss.
Is getting a negative loss a problem?