U-net segmentation loss does not decrease

Hi all,

I’m quite new to pytorch so I know I could be doing some very basic mistakes. Currently I’m trying to do image segmentation using U-net using a structure of notebook very similar to this notebook of the carvana-segmentation competition in kagglem (https://www.kaggle.com/cankls3/carvana-pytorch).

I have a training dataset of 600 RGB images (224x224) labeled that I’m passing to the model:

train_data_loader = DL.DataLoader(train_dataset, batch_size=1, shuffle=False)

def double_conv(in_channels, out_channels):

    return nn.Sequential(
        nn.Conv2d(in_channels, out_channels, 3, padding=1),
        nn.Conv2d(out_channels, out_channels, 3, padding=1),

class UNet(nn.Module):

    def __init__(self, n_class):
        self.dconv_down1 = double_conv(3, 64)
        self.dconv_down2 = double_conv(64, 128)
        self.dconv_down3 = double_conv(128, 256)
        self.dconv_down4 = double_conv(256, 512)  

        self.maxpool = nn.MaxPool2d(2)
        self.dconv_up3 = double_conv(512, 256)
        self.dconv_up2 = double_conv(256, 128)
        self.dconv_up1 = double_conv(128, 64)
        self.TConv3 = nn.ConvTranspose2d(512, 256, 2, stride=2)
        self.TConv2 = nn.ConvTranspose2d(256, 128, 2, stride=2)
        self.TConv1 = nn.ConvTranspose2d(128, 64, 2, stride=2)
        self.conv_last = nn.Conv2d(64, n_class, 1)
    def forward(self, x):
        conv1 = self.dconv_down1(x)
        x = self.maxpool(conv1)
        conv2 = self.dconv_down2(x)
        x = self.maxpool(conv2)
        conv3 = self.dconv_down3(x)
        x = self.maxpool(conv3)
        x = self.dconv_down4(x)
        x = self.TConv3(x)
        x = torch.cat([x, conv3], dim=1)

        x = self.dconv_up3(x)
        x = self.TConv2(x)
        x = torch.cat([x, conv2], dim=1)

        x = self.dconv_up2(x)
        x = self.TConv1(x)
        x = torch.cat([x, conv1], dim=1)

        x = self.dconv_up1(x)
        out = self.conv_last(x)
        out = F.sigmoid(out)
        return out

Followed by:

model = UNet(n_class=1)
model = model.cuda()
optimizer = optim.Adam(model.parameters(), lr=1e-5)
criterion = nn.BCELoss()

When I start to train in the first 20 epochs the loss goes down quite fast but then starts to saturate around 0.63 and does not really moves from these values.

I’ve tried changing the learning rate and tried to use SGD optimizer but I did not have any significat result (actually with SGD the loss increases ).

Any suggestion in terms of structure of the model or parameters to tune? Maybe the training set is not enough big.

This is my training for loop:


num_epochs = 50
running_loss = 0.0

for epoch in range(num_epochs):
    running_loss = 0.0
    for i, (X, y) in enumerate(train_data_loader):
        X = X.to(device)
        y = y.to(device)
        X = Variable(X)
        y = Variable(y)
        output = model(X)
        loss = criterion(output, y)
        running_loss += loss.item()
        checkpoint = {
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'loss': loss,
        torch.save(checkpoint, 'checkpoint.pth')
        # accuracy
        #_, predicted = torch.max(output, y)
        #total_train += y.nelement()
        #correct_train += predicted.eq(y.data).sum().item()
        #train_accuracy = 100 * correct_train / total_train
        #avg_accuracy = train_accuracy / len(train_loader)  
        #,  "Training Accuracy: %d %%" % (train_accuracy)
    print("loss for epoch " + str(epoch) + ":  " + str(running_loss))

P.s I’m trying to implement the train accuracy but for the moment is not working so I’ve commented those lines, in case you have ideas please don’t esistate

Thanks in advance for any help

The code looks generally alright.
You could try to remove the sigmoid activation at the end of your model and instead use nn.BCEWithLogitsLoss. This would give you more numerical stability and might avoid vanishing gradients if your output saturates.

Besides that, you should remove the Variable calls, as they are deprecated since 0.4.0.
Also, the usage of .data is not recommended, as this might yield silent errors, since Autogradd cannot track these operations.

Thanks your reply!

So basically it would comment out = F.sigmoid(out) and placing the loss with
out = nn.BCEWithLogitsLoss(out) ? .

Also I was considering that the very low decrease of the loss is due to my images that are RGB but not normalized (like to resnet values or with their std and avg).

Currently I’m loading my data by using the following code:

class MyDataset_test(Dataset):
def init(self, image_paths ,train=True):
self.image_paths = image_paths
#self.mask_paths = mask_paths

    def transforms(self, image, mask):
        #img = img.resize((wsize, baseheight), PIL.Image.ANTIALIAS)
        #image = transforms.Resize(size=(64, 64))(image)
        #mask = transforms.Resize(size=(64, 64))(mask)
        image = image.resize((64, 64), PIL.Image.NEAREST)
        mask = mask.resize((64, 64), PIL.Image.NEAREST)
        image = TF.to_tensor(image)
        mask = TF.to_tensor(mask)
        return  mask
    def __getitem__(self, index):
        #image = Image.open(self.image_paths[index])
        mask = Image.open(self.mask_paths[index])
        x, y = self.transforms(mask)
        return x, y
    def __len__(self):
        return len(self.image_paths)

maybe this could be a possible issue.

Thanks in advance!

Use nn.BCEWithLogitsLoss as the criterion instead of nn.BCELoss and pass the raw logits to it by removing the sigmoid.

Normalization might help your training so you should definitely try it out.
Also you could try to overfit a small data sample and make sure your current training routine works fine.
Once this works you could try to scale it up again.

Currently I’m using as you suggested nn.BCEWithLogitsLoss and tried the model on a small data set of 15 images (should be small enough). But now the loss even if it goes down very fast it saturates and around 0.086 and 0.084.
So I guess this indicates that there is a more serious problem on the model or on the data?

I’ve tried out of curiosity this example using ResNetUnet from this example ResNetUnet and again the loss saturates for a small training data around 0.083.

I’ve also normalized my data with

    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])

Any suggestion is more than welcome. Thanks again !

I was looking to this post (UNet implementation a bit old) where apparently in pytorch there were some issues to implement Unet. I could not find a real solution for the moment, Saed in one reply wrote only

"For the last set of convolutions, that is 128-> 64 -> 64 -> 1, the activation function should not be used!
The activation function causes the values to vanish!

I just removed the nn.ReLU() modules on top of these convolution layers and now everything works like a charm! "