Question about output and label channels in semantic segmentation

mshmoon · April 2, 2018, 12:44pm

I use some nets,FCN8 ,SegNet for semantic segmentation .The trouble follow:
all of the nets I used,The last layers of this net output the feature maps is (1,22,256,256),why not (1,3,256,256)? and another question is that the labels size is (1,1,256,256),why not(1,3,256,256)?
please help me, I a new gay

ptrblck · April 2, 2018, 1:16pm

The output and label shape depend on your classification problem. If the last layer outputs an activation of [batch_size, 22, width, height], it probably means that the model was used to classify each pixel into one of 22 classes.
You should change the number of output channels to your use case.

Edit: I edited your title so that the question becomes clearer.

zeakey · April 2, 2018, 1:30pm

22 = 21 (classes) + 1 (background) in PASCAL_VOC dataset.

mshmoon · April 3, 2018, 12:29pm

Thank your for you help . I know it by debug

mshmoon · April 3, 2018, 12:31pm

Thank you for you help.Thank you very much .I know it

mshmoon · April 3, 2018, 12:48pm

I am sorry,I want to ask for you another question.The label size is (batch_size, 1, width, height),that means the lable have one channel,so it is a picture?
And,the output size is (batch_size, class_num, width, height),then the cost function to compute loss by the every channel of the output and the only channel of the label?

ptrblck · April 3, 2018, 3:08pm

Yes, the label seems to be a picture. You could try to visualize it using something like:

import matplotlib.pyplot as plt
plt.imshow(label[0, 0, ...].data.numpy())

When you are dealing with a multi-class problem, you could use NLLLoss or CrossEntropyLoss. Have a look at the doc.

Here is a small example with 22 classes and a binary example:

# Multi-class example
x = Variable(torch.randn(1, 22, 10, 10))
y = Variable(torch.LongTensor(1, 10, 10).random_(22))

criterion = nn.NLLLoss()

loss = criterion(x, y)

# Binary example
x = F.sigmoid(Variable(torch.randn(1, 1, 10, 10)))
y = Variable(torch.FloatTensor(1, 1, 10, 10).random_(2))

criterion = nn.BCELoss()

loss = criterion(x, y)

mshmoon · April 4, 2018, 8:21am

Thank you for you help ,good friend

Vishrut10 · June 20, 2019, 8:57pm

I am facing similar problems. What should be input dimension in case of multi-class labels. I have image with 3 channels and labels with 1 channel and 12 classes.
I get output prediction with 12 channels. How do I tackle this ?
Thanks in advance!

ptrblck · June 20, 2019, 9:05pm

If you are dealing with a multi-class classification, you could use nn.CrossEntropyLoss with a model output of [batch_size, nb_classes, h, w] and a target of [batch_size, h, w] containing the class indices.
How is your current target image defined?

Vishrut10 · June 20, 2019, 9:26pm

Thank you for your reply.
Sorry for incomplete information

This is a semantic segmentation problem

Input data: Image is in .jpg format. I use transform.ToTensor() to convert this.
Target: Annotated .png image with 12 classes. I convert it using “torch.from_numpy(np.array(mask)).long()”

Following is the code I am working on.

data loader class

class GetDataset(torch.utils.data.Dataset):
def init(self, root, transformation=None):
self.root = root
self.transformation = transformation
# load all image files, sorting them to
# ensure that they are aligned
self.imgs = list(sorted(os.listdir(os.path.join(root, “JPEGImages_test”))))
self.masks = list(sorted(os.listdir(os.path.join(root, “SegmentationClass_test”))))

def __getitem__(self, idx):
    # load images ad masks
    img_path = os.path.join(self.root, "JPEGImages_test", self.imgs[idx])
    mask_path = os.path.join(self.root, "SegmentationClass_test", self.masks[idx])
    print(idx)
    #import pdb
    #pdb.set_trace()
    img = Image.open(img_path)
    
    mask = Image.open(mask_path)
    
    img_t = self.transformation(img)
    mask_t = torch.from_numpy(np.array(mask)).long()
    return img_t, mask_t 

def __len__(self):
    return len(self.imgs)

==> Tried both losses

#criterion = nn.CrossEntropyLoss()
criterion = nn.NLLLoss()

Train function

def train_model(model, criterion, optimizer, dataloaders, scheduler, num_epochs=1):
import copy

best_model_wts = copy.deepcopy(model.state_dict())
print(num_epochs)
for epoch in range(num_epochs):
    print('Epoch {}/{}'.format(epoch, num_epochs - 1))
    print('-' * 10)
    model.train()  # Set model to training mode

    for inputs, labels in dataloaders:
        print("reached")
        
        inputs = inputs.to(device)
        type(input)
        labels = labels.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        #import pdb
        #pdb.set_trace()
        loss = criterion(outputs['out'], labels)
        loss.backward()
        optimizer.step()

    best_model_wts = copy.deepcopy(model.state_dict())

# load best model weights
model.load_state_dict(best_model_wts)
return model

#Train model
model_ft = train_model(model_segdl, criterion, optimizer_ft,data_loader, exp_lr_scheduler, num_epochs=1)

model_ft.eval()
x = torch.rand(1, 3, 300, 400)
predictions = model_ft(x)

predictions[‘out’].size()
Out[59]: torch.Size([1, 12, 300, 400])

Is this correct ? If not what I am missing

ptrblck · June 22, 2019, 7:24pm

Could you print the shape of mask_t?
How are your masks stored? Is each class corresponding to a certain color in an RGB image or do the image files already contain class indices?

Vishrut10 · June 22, 2019, 7:48pm

Yes I have indices for each class. mask_t are essentially labels, when I load them from dataloader, the shape is as follows.
Although I used label inputs with dimension of [1, 512, 512](which is my .png file) and converted it to long using torch.from_numpy(np.array(mask)).long()

ipdb> labels.size()
torch.Size([3, 512, 512])

ipdb> labels
tensor([[[2, 2, 2, …, 2, 2, 2],
[2, 2, 2, …, 2, 2, 2],
[2, 2, 2, …, 2, 2, 2],
…,
[2, 2, 2, …, 2, 2, 2],
[2, 2, 2, …, 2, 2, 2],
[2, 2, 2, …, 2, 2, 2]],

    [[2, 2, 2,  ..., 2, 2, 2],
     [2, 2, 2,  ..., 2, 2, 2],
     [2, 2, 2,  ..., 2, 2, 2],
     ...,
     [9, 9, 9,  ..., 2, 2, 2],
     [9, 9, 9,  ..., 2, 2, 2],
     [9, 9, 9,  ..., 2, 2, 2]],

    [[1, 1, 1,  ..., 2, 2, 2],
     [1, 1, 1,  ..., 2, 2, 2],
     [1, 1, 1,  ..., 2, 2, 2],
     ...,
     [1, 1, 1,  ..., 2, 2, 2],
     [1, 1, 1,  ..., 2, 2, 2],
     [1, 1, 1,  ..., 2, 2, 2]]])

ptrblck · June 22, 2019, 7:55pm

Based on the example output, it looks like each channel contains different values.
Are you sure that the mask image does not contain a color code?
Could you try to call target.view(target.size(0), -1).unique(dim=1) and post the unique color values here?

Vishrut10 · June 22, 2019, 8:01pm

I had converted image to grayscale, shall I use annotated .png image ?

ptrblck · June 22, 2019, 8:02pm

I’m not sure what the annotated .png images are, but your image does not seem to be grayscale, as the channels contain different values.
What did the unique call return?

Vishrut10 · June 22, 2019, 8:27pm

labels.view(labels.size(0), -1).unique(dim=1)
tensor([[ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 9, 9, 9, 9, 9, 9, 9, 9],
[ 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 5,
5, 5, 5, 5, 6, 6, 6, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 9,
9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 11, 2, 2,
2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 6, 6, 7, 7, 7, 7,
8, 8, 8, 8, 9, 9, 9, 9, 9, 11, 11, 2, 2, 2, 3, 3, 3, 9,
9, 9, 10, 10, 11, 11, 2, 2, 2, 5, 6, 7, 7, 7, 9, 9, 10, 11,
11, 2, 2, 4, 4, 7, 7, 9, 9, 10, 10, 10, 11, 11, 2, 2, 2, 2,
2, 2, 3, 3, 3, 3, 3, 3, 6, 7, 7, 8, 8, 8, 8, 9, 9, 9,
9, 2, 2, 3, 3, 7, 7, 10, 10],
[ 1, 2, 3, 5, 6, 7, 8, 9, 2, 3, 6, 7, 8, 9, 2, 7, 9, 1,
2, 3, 8, 9, 1, 2, 6, 2, 5, 6, 7, 9, 2, 3, 6, 8, 9, 1,
2, 3, 5, 6, 7, 8, 9, 1, 2, 3, 5, 2, 3, 5, 6, 7, 2, 3,
5, 6, 7, 8, 9, 2, 3, 6, 7, 8, 9, 7, 2, 6, 2, 5, 6, 7,
2, 3, 6, 9, 2, 3, 5, 6, 7, 3, 7, 2, 3, 8, 2, 3, 8, 2,
3, 8, 2, 3, 2, 3, 2, 5, 9, 2, 2, 2, 5, 9, 2, 5, 2, 2,
5, 2, 3, 2, 3, 2, 5, 2, 3, 2, 3, 5, 2, 3, 2, 3, 6, 7,
8, 9, 2, 3, 6, 7, 8, 9, 2, 2, 6, 2, 3, 6, 9, 2, 3, 6,
7, 2, 9, 2, 9, 2, 9, 2, 3]])

I definitely sense some problem here

ptrblck · June 22, 2019, 8:39pm

It looks a bit strange.
I assumed your tensor has a shape of [channels, height, width], which should yield a clean output of the different color values.
Are you using an additional batch dimension?
If so, could you slice the tensor and run the code again?

Anyway, the values look a bit fishy and something might be wrong.
Could you also try to plot the image using e.g. matplotlib?

Vishrut10 · June 22, 2019, 8:48pm

Thats a sample .png image

plt.imshow(img)
Out[10]: <matplotlib.image.AxesImage at 0x1288d4da0>

img.size
Out[12]: (512, 512)

np.unique(np.array(img))
Out[13]: array([2, 3, 5, 6, 7, 8, 9], dtype=uint8)

ptrblck · June 22, 2019, 8:58pm

This looks perfectly fine.
Could you use this target image and try to recreate the loading logic from your Dataset and check, which method creates these pseudo-values?