Understanding the label/target process loading for semantic segmentation

So I have been teaching myself PyTorch for semantic segmentation using FCN.

I started with learing about Dataset class and DataLoaders and made a simple network that could classify the MNIST datadset.

I moved to FCN and coded the network architecture from the paper and from the provided diagram and also from looking at some examples on github.

Its the data loading part that has me confused.

i understand that the Torch DataLoader will take a Dataset class that we have to write ourselves. The DataSet class has to return an image and its respective label from __getitem__(self, idx), returning the image is easy, its the label of said image is confusing me.

With MNIST each image had a simple label, but with semantic segmentation each image is divided into multiple colours and those colours represent the segmentation. How do I feed this into the network?

I am uisng the CamVid dataet and they provide the raw images, the coloured labeled images and a text file representing the colour of each label.

For example, if I have to use the image and its respective segmented image, I would write my DataSet class as:

class CamVid(Dataset):

    def __init__(self, filenames, labels, root_dir, transform=None):
        assert len(filenames) == len(labels)        # if the two are not of equal length throw an error
        self.filenames = filenames
        self.labels = labels
        self.root_dir = root_dir
        self.transform = transform

    def __len__(self):
        return len(self.filenames)

    def __getitem__(self, idx):
        this_img = join(self.root_dir, self.filenames[idx]+'.png')
        img = Image.open(this_img)
        this_label =  join(self.root_dir, self.labels[idx]+'.png')
        label = Image.open(this_label)

        if self.transform:
            img = self.transform(img)

        return [img, label]

This will return the image and its segmented counterpart, and I can feed this to my network, but I dont want to as I dont understand what the network will learn from this. I know I have to incorporate the label colour codes from the text file but dont know how and at what stage…

here is the .txt file with the colour codes:

64 128 64	Animal
192 0 128	Archway
0 128 192	Bicyclist
0 128 64	Bridge
128 0 0		Building
64 0 128	Car
64 0 192	CartLuggagePram
192 128 64	Child
192 192 128	Column_Pole
64 64 128	Fence
128 0 192	LaneMkgsDriv
192 0 64	LaneMkgsNonDriv
128 128 64	Misc_Text
192 0 192	MotorcycleScooter
128 64 64	OtherMoving
64 192 128	ParkingBlock
64 64 0		Pedestrian
128 64 128	Road
128 128 192	RoadShoulder
0 0 192		Sidewalk
192 128 128	SignSymbol
128 128 128	Sky
64 128 192	SUVPickupTruck
0 0 64		TrafficCone
0 64 64		TrafficLight
192 64 128	Train
128 128 0	Tree
192 128 192	Truck_Bus
64 0 64		Tunnel
192 192 0	VegetationMisc
0 0 0		Void
64 192 0	Wall

I’m really new to this and some guidance would be appreciated.

Many thanks

You could use this code to transform your color label images to label images containing class indices.

1 Like