Loss value on U-net

I am currently working on Semantic segmentation on VOC2012 dataset with U-net. I encounter a problem that loss is dropping a bit slow and not moving after few epoch.
Here is my training function and model

def train(args, model, device, train_loader, optimizer, epoch):
    running_loss = 0.0
    total_train = 0
    running_correct = 0.0
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        output = model(data)
        criterion = nn.CrossEntropyLoss()
        loss = criterion(output, target[:, 0])
        running_loss += loss.item()
        predicted = torch.argmax(output.data, 1)
        total_train += target.nelement()
        running_correct += predicted.eq(target.data).sum().item()
    print(epoch, running_correct / total_train)
    print(running_loss / len(train_loader.dataset))
import torch
import torch.nn as nn
import torch.nn.functional as F

class UNetEnc(nn.Module):
    def __init__(self, in_channels, features, out_channels):

        self.up = nn.Sequential(
            nn.Conv2d(in_channels, features, 3), #kernal size: 3
            nn.Conv2d(features, features, 3),
            nn.ConvTranspose2d(features, out_channels, 2, stride=2),

    def forward(self, x):
        return self.up(x)

class UNetDec(nn.Module):
    def __init__(self, in_channels, out_channels, dropout=False):

        layers = [
            nn.Conv2d(in_channels, out_channels, 3),
            nn.Conv2d(out_channels, out_channels, 3),
        if dropout:
            layers += [nn.Dropout(.5)]
        layers += [nn.MaxPool2d(2, stride=2, ceil_mode=True)]

        self.down = nn.Sequential(*layers)

    def forward(self, x):
        return self.down(x)

class UNet(nn.Module):

    def __init__(self, num_classes):

        self.dec1 = UNetDec(3, 64)
        self.dec2 = UNetDec(64, 128)
        self.dec3 = UNetDec(128, 256)
        self.dec4 = UNetDec(256, 512, dropout=True)
        self.center = nn.Sequential(
            nn.Conv2d(512, 1024, 3),
            nn.Conv2d(1024, 1024, 3),
            nn.ConvTranspose2d(1024, 512, 2, stride=2),
        self.enc4 = UNetEnc(1024, 512, 256)
        self.enc3 = UNetEnc(512, 256, 128)
        self.enc2 = UNetEnc(256, 128, 64)
        self.enc1 = nn.Sequential(
            nn.Conv2d(128, 64, 3),
            nn.Conv2d(64, 64, 3),
        self.final = nn.Conv2d(64, num_classes, 1)

    def forward(self, x):
        dec1 = self.dec1(x)
        dec2 = self.dec2(dec1)
        dec3 = self.dec3(dec2)
        dec4 = self.dec4(dec3)
        center = self.center(dec4)
        enc4 = self.enc4(torch.cat([
            center, F.upsample_bilinear(dec4, center.size()[2:])], 1))
        enc3 = self.enc3(torch.cat([
            enc4, F.upsample_bilinear(dec3, enc4.size()[2:])], 1))
        enc2 = self.enc2(torch.cat([
            enc3, F.upsample_bilinear(dec2, enc3.size()[2:])], 1))
        enc1 = self.enc1(torch.cat([
            enc2, F.upsample_bilinear(dec1, enc2.size()[2:])], 1))

        return F.upsample_bilinear(self.final(enc1), x.size()[2:])

and my optimizer

optimizer = optim.Adam(model.parameters(), lr=0.0001)

Also the output of loss

1 0.5320905093947355
2 0.5344406446928762
3 0.537645096729026
4 0.538817567193965
5 0.5387050243204894

Thank you if anyone know how can I improve the performance

Hi Jason!

This looks a little suspicious. First you use CrossEntropyLoss to
compare output with target, but with one dimension selected
away. But then in your accuracy calculation you perform the .eq()
test on output with one dimension reduced away by .argmax()
with target with all of its dimensions.

What are the shapes of output and target? What are the meanings
of their elements? In particular, why are you selecting away one of
target's dimensions in your loss calculation? (Is it a “channels” or
“classes” dimension?)

Also, could you outline in a short sentence the problem you are
working on?

It’s hard to have much certainty without knowing the details of your
use case, but it’s not at all clear that your training isn’t progressing

Your loss is moving steadily down, and your accuracy is moving
up, albeit slowly, with a slight downtick for the last epoch for which
you gave us results. This doesn’t seem so bad.

Also, for many realistic problems, five epochs is just getting started,
and not really enough to draw conclusions about the training (but
this depends on your use case and how large your training set is).
What do the loss and accuracy curves look like if you train for, say,
a hundred or several hundred epochs?


K. Frank

Thank you for you reply.
The shape of output, target and predicted are in the following

output torch.Size([1, 22, 256, 256])
#(Batch size, nClasses, H, W)
target torch.Size([1, 1, 256, 256])
#(Batch size, Channels, H, W)
predict torch.Size([1, 256, 256])
#(Batch size H, W)

I am dropping the channel dimension for loss calculation as I think that’s the required input type for CrossEntropyLoss and do you know the calculation method for accuracy is correct or not.

I am working on semantic segmentation problem and my training set is about 2000 images.

I found that the loss is keep dropping but a bit slow. I’m trying to run for a larger epochs now

Hi Jason!

Yes, that should be okay then. the “extra” dimension in target is
a singleton (size-1) dimension, so, in some sense, it doesn’t really
count. (More precisely, when you “select it away” you don’t lose
any information – the resulting tensor with one fewer dimension
still has the same number of elements.)

Yes, this is correct. You do need to get rid of target's “extra” singleton
dimension to keep CrossEntropyLoss from complaining.

It looks correct. (But don’t trust me – test and double-check it.) Note
that because predicted and target don’t have the same shape
(target has that extra dimension), broadcasting will occur when you
evaluate predicted.eq (target.data). But because it’s a singleton
dimension, you’ll get the same result, so nothing will break.

I haven’t used U-Net and I don’t really have any intuition about it,
but I imagine that 2000 training images should be enough to train
reasonably well.

I do notice that your accuracy starts off (that is, after just one epoch
of training) at about 50%. Given that you have 22 classes, this seems
unexpectedly high. Random guessing with 22 uniformly distributed
classes would give you an accuracy of only about 5%.

Is it possible that one of your classes is a “background” class and
a large fraction of your pixels are background pixels? For example,
if half of your pixels were background pixels, you would get 50%
accuracy by always predicting background.

If your data set is unbalanced because some classes are rare or
especially common (such as in the background-pixel example),
you might want to try CrossEntropyLoss's class-weight weight
constructor argument to help counteract the class imbalance.


K. Frank

Thank you so much for your help. Some update about the training process, after 100 epoch the accuracy is about 80% so I think model is fine.

Yes, the labels contains large amount of background. Thank you for your suggestion about the CrossEntropy with weight. I think it will be useful and I am trying it out.