Semantic Segmentation using deeplabv3+resnet101 from torchvision models

I am using the Deeplab V3+ resnet 101 to perform binary semantic segmentation.

import torch
import torchvision
import loader
from loader import DataLoaderSegmentation
import torch.nn as nn
import torch.optim as optim
import numpy as np
from import SubsetRandomSampler

batch_size = 1
validation_split = .2
shuffle_dataset = True
random_seed= 66

n_class    = 2
num_epochs = 1
lr         = 1e-4
momentum   = 0.9
w_decay    = 1e-5
step_size  = 50
gamma      = 0.5
traindata = DataLoaderSegmentation('/home/ubuntu/Downloads/imgs/lensonly/')
model = torchvision.models.segmentation.fcn_resnet101(pretrained=False, progress=True, num_classes=2).cuda()
criterion = nn.CrossEntropyLoss().cuda()
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum, weight_decay=w_decay)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

for iter in range(num_epochs):
    for (i,l) in trainloader:
        l =
        outt = model(i)
        loss = criterion(outt['out'], l.squeeze(0))
l       loss.backward()
    print(iter), '/home/ubuntu/Downloads/newnet.pth')

once I get the model trained I evalute the model on 1 of the images

import torch
import torchvision
import numpy as np
from import SubsetRandomSampler
from PIL import Image
import matplotlib.pyplot as plt
import torchvision.transforms as T
img = cv2.imread('/home/ubuntu/Downloads/Brain/test0716/train/slice_src_BN01002_032.png')
trf = T.Compose([T.ToTensor(),T.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
inp = trf(img).unsqueeze(0).cuda()
fcn = torch.load('newnet.pth')
sam_out = fcn(inp)['out']
om = torch.argmax(sam_out.squeeze(), dim=0).cpu().numpy()

The output i get for “sam_out” has a shape of [1 2 417 417]. it is given below

>>> sam_out
tensor([[[[ 2.3502,  2.3502,  2.3502,  ...,  1.9911,  1.9911,  1.9911],
          [ 2.3502,  2.3502,  2.3502,  ...,  1.9911,  1.9911,  1.9911],
          [ 2.3502,  2.3502,  2.3502,  ...,  1.9911,  1.9911,  1.9911],
          [ 1.8227,  1.8227,  1.8227,  ...,  2.0846,  2.0846,  2.0846],
          [ 1.8227,  1.8227,  1.8227,  ...,  2.0846,  2.0846,  2.0846],
          [ 1.8227,  1.8227,  1.8227,  ...,  2.0846,  2.0846,  2.0846]],

         [[-1.7641, -1.7641, -1.7641,  ..., -1.8655, -1.8655, -1.8655],
          [-1.7641, -1.7641, -1.7641,  ..., -1.8655, -1.8655, -1.8655],
          [-1.7641, -1.7641, -1.7641,  ..., -1.8655, -1.8655, -1.8655],
          [-1.6157, -1.6157, -1.6157,  ..., -1.8989, -1.8989, -1.8989],
          [-1.6157, -1.6157, -1.6157,  ..., -1.8989, -1.8989, -1.8989],
          [-1.6157, -1.6157, -1.6157,  ..., -1.8989, -1.8989, -1.8989]]]],
       device='cuda:0', grad_fn=<UpsampleBilinear2DBackward>)

now if you notice that for the two [417 417] output layers if I apply argmax it will give me all pixel labels as 0 and no pixel labels as 1. This happens because in one layer all values are positive while the second layer all values are negative. Am I missing something, any and all help is appreciated.

Does the training look alright, i.e. do you get predictions for both classes?
I’m not sure, why you are squeezing the target, but I assume you have somehow an additional dimension in your data?

If the training looks fine, did you make sure to apply the same transformation during training and evaluation?

The code looks alright. At least I cannot find any obvious bugs.

Thanks for your response. Below are my responses to your question.

  1. Regarding squeezing the target.
    The shape for my model output is [1, 2, 417, 417] and the shape of the target (label) is [1, 1, 417, 417]. The loss function only worked when I squeezed the 0th dimension.

  2. I am a little confused by one of your question
    “do you get predictions for both classes?”
    did you mean, while training is my network predicting the second label at all? (even if it is not correct))

  3. Yes, I am applying the same transform while evaluating the model. The only transform I am applying is ToTensor().

Thanks for your continued help

Thanks for the information!

  1. It seems dim1 is unnecessary, but based on the current shape, it doesn’t matter if you squeeze dim0 or dim1.

  2. Yes, I would just like to know, if the model performance is already bad during training and if the model also outputs a single class then.

  3. It looks like you are also normalizing, which would make a difference, if only applied during testing.

Thanks for your response.

Response to

  1. I checked the model and the unique number of labels in the prediction during each training iteration.
    it seems like the model has both labels 0 and 1 during the initial stages of training, once the model reaches 99% accuracy further training for some reason changes the number of unique labels in prediction to only 1 which is label 0. Also, the 99% accuracy is obtained half way though the first epoch. I am only training for 1 epoch

  2. I changed my code to only not have normalization for training and testing.

Thanks for your help

I forgot to mention one more thing. While obtaining my label I transform it to tensor which divides each of the pixel value in it by 255. This changes my label values from [0 1] to [0 , 1/255]. I manually change this back by assigning label[label!=0]=1, followed by label=label.long(). This label is then passed into the loss function. Dont know if this would change anything.

After much search I came across another post with the same problem , but there doesn’t seem to be a solution posted.

Hi @ptrblck
There has been an update, I included the ignore index in loss

criterion = nn.CrossEntropyLoss(ignore_index=0)

now my model has predictions for both class 0 and 1 during and after training. Though, I am waiting to run it for more epochs to see a final result.


If your model reaches 99% accuracy, it starts to predict only zeros?
If so, it looks like you are dealing with an imbalanced segmentation use case, i.e. 99% of your pixel labels are 0s, thus your model just learns the majority class.
A simple function like:

def predict(image):
    return torch.zeros_like(image)

would therefore also achieve a 99% accuracy.
Could this be the case? If so, you should try e.g. class weighting in your loss function to counter the overfitting on the majority class.

This might work and seams reasonable, but I would recommend to remove the ToTensor() transformation for your label and just transform it manually to a tensor by e.g. using torch.from_numpy in your __getitem__.
Also be careful about resizing the segmentation target, as interpolations other than nearest neighbor will change your labels (if you are using resizing somewhere).

Your ignore_index=0 approach seems also to point to an imbalanced target distribution.
Could you count the class labels for a few target tensors using:


Thank you for your help. Extracting class weights using the mean frequency balancing method and adding it to crossentropyloss solved my problem,

Thanks again for your help

1 Like

Hi Nishanth and ptrblck,

Thanks for the discussion. I have a similar problem of detecting thin wires (power lines on utility poles). In my case, I face the same problem of background pixels being in abundance and the masked pixels of wire being negligible compared to the background class. Could you please explain how to pass weights to the loss function? I’m currently using nn.NLLLoss().

Help needed urgently. Any help would be of great help.

Here’s the code I’m using.

from __future__ import print_function, division
import os
import torch
import numpy as np
import matplotlib.pyplot as plt
from import Dataset, DataLoader
from torchvision import transforms, utils
from PIL import Image # w, h when an image is opened
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
from torchvision import datasets, models
import time
import copy
from tqdm import tqdm

import warnings

class semanticSegmentationDataset(Dataset):
    """Semantic Segmentation Dataset."""    

    def __init__(self, root_dir, transformList=None):
            root_dir (string): train or test Directory with the images and masks folders.
            transformList (callable, optional): Optional transforms to be applied on a sample.
        self.root_dir = root_dir
        self.transformList = transformList
        self.imgs = list(sorted(os.listdir(os.path.join(root_dir, "images"))))
        self.msks = list(sorted(os.listdir(os.path.join(root_dir, "masks"))))

    def __getitem__(self, idx):

        if torch.is_tensor(idx):
            idx = idx.tolist()

        img_path = os.path.join(self.root_dir, "images", self.imgs[idx])
        msk_path = os.path.join(self.root_dir, "masks", self.msks[idx])
        image =
        mask =
        sample = {'image': image, 'mask': mask}

        if self.transformList:
            sample = self.transformList(sample)

        return sample

    def __len__(self):
        return len(self.imgs)

class Rescale(object):
    """Rescale the image in a sample to a given size.

        output_size (tuple or int): Desired output size. If tuple, output is
            matched to output_size. If int, smaller of image edges is matched
            to output_size keeping aspect ratio the same.

    def __init__(self, output_size):
        assert isinstance(output_size, (int, tuple))
        self.output_size = output_size

    def __call__(self, sample):
        image, mask = sample['image'], sample['mask']
        w, h = image.size

        if isinstance(self.output_size, int):
            if h > w:
                new_h, new_w = self.output_size * h / w, self.output_size
                new_h, new_w = self.output_size, self.output_size * w / h
            new_h, new_w = self.output_size

        new_h, new_w = int(new_h), int(new_w)

        imageT = transforms.Compose([transforms.Resize((new_h,new_w))])
        maskT = transforms.Compose([transforms.Resize((new_h,new_w), interpolation = 0)])

        resizedImg = imageT(image)
        resizedMsk = maskT(mask)

        return {'image': resizedImg, 'mask': resizedMsk}

class ToTensor(object):
    """Convert ndarrays in sample to Tensors."""

    def __call__(self, sample):
        image, mask = sample['image'], sample['mask']

        imageT = transforms.Compose([transforms.ToTensor()])
        tensorImage = imageT(image)

        return {'image': tensorImage, 'mask': torch.from_numpy(np.array(mask)).long()}

class Normalize(object):
    """Convert pixel values (Tensors) to the range 0-1"""

    def __call__(self, sample):
        image, mask = sample['image'], sample['mask']
        imageT = transforms.Compose([transforms.Normalize(mean=[0.485, 0.456, 0.406], 
                                                          std =[0.229, 0.224, 0.225],
        normImage = imageT(image)
        return {'image': normImage, 'mask': mask}

composedTransformList = transforms.Compose([Rescale((576,378)),

from torchvision.models.segmentation.deeplabv3 import DeepLabHead

def createDeepLabv3(num_classes):
    model = models.segmentation.deeplabv3_resnet101(pretrained=True, progress=True)
    # Added a Sigmoid activation after the last convolution layer
    model.classifier = DeepLabHead(2048, num_classes)
    return model

def train_model(model, criterion, optimizer, dataloaders, device, scheduler, num_epochs=25, print_freq=1):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_loss = 1e15

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        loss_history = {'train': [], 'val': []}
        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
                model.eval()   # Set model to evaluate mode

            # Iterate over data.
            for sample in tqdm(iter(dataloaders[phase])):
                imgs = sample['image'].to(device)
                msks = sample['mask'].to(device)

                # zero the parameter gradients

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(imgs)
                    loss = criterion(outputs['out'], msks)

                    # backward + optimize only if in training phase
                    if phase == 'train':

            if phase == 'train':

            epoch_loss = np.float(
            if (epoch + 1) % print_freq == 0:
                print('Epoch: [%d/%d], Loss: %.4f' %(epoch+1, num_epochs, epoch_loss))

            # deep copy the model
            if phase == 'val' and epoch_loss < best_loss:
                best_loss = epoch_loss
                best_model_wts = copy.deepcopy(model.state_dict())

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_loss))

    # load best model weights
    return model, loss_history

trainDataset = semanticSegmentationDataset(root_dir='data/train/', transformList=composedTransformList)
valDataset = semanticSegmentationDataset(root_dir='data/val/', transformList=composedTransformList)
testDataset = semanticSegmentationDataset(root_dir='data/test/', transformList=composedTransformList)

trainDataloader = DataLoader(trainDataset, batch_size=4, shuffle=True, num_workers=4)
valDataloader = DataLoader(valDataset, batch_size=4, shuffle=True, num_workers=4)

dataloaders = {'train': trainDataloader, 'val': valDataloader}

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

num_classes = 2 # background and wires

# get the model
model = createDeepLabv3(num_classes)

# move model to the right device

#zerocount = 0
#onecount = 0
#for i in range(len(trainDataset)):
#    dat = trainDataset[i]
#    counts = dat['mask'].unique(return_counts=True)[1]
#    zerocount += counts[0]
#    onecount += counts[1]

weightForZeroClass = np.float((1/np.float(zerocount)))*0.5 # Also tried these values
weightForOneClass = np.float((1/np.float(onecount)))*0.5 # Also tried these values

criterion = nn.NLLLoss(reduction='mean', weight=torch.tensor([0.01, 1]).to(device)) # Tried these values randomly

# construct an optimizer
params = [p for p in model.parameters() if p.requires_grad]

#optimizer = torch.optim.SGD(params, lr=0.005,momentum=0.9, weight_decay=0.0005)
optimizer = optim.Adam(params, lr=1e-3, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.0005, amsgrad=False)

# and a learning rate scheduler which decreases the learning rate by
# 10x every 3 epochs
scheduler = torch.optim.lr_scheduler.StepLR(optimizer,step_size=3,gamma=0.1)

total_epoch = 20

finalTrainedModelSpan, lossDictSpan = train_model(model, criterion, optimizer, dataloaders, device, scheduler, total_epoch)

## Code to predict and display the predicted mask

#i = 3
#with torch.no_grad():
#    preds = model(testDataset[i]['image'].reshape(1,3,576,378).to(device))
#displayPreds = torch.argmax(preds['out'], 1)
#predMaskToDisplay = displayPreds.cpu().numpy().reshape(576,378)
#actualMaskToDisplay = testDataset[i]['mask'].numpy()