So I have been teaching myself PyTorch for semantic segmentation using FCN.
I started with learing about Dataset class and DataLoaders and made a simple network that could classify the MNIST datadset.
I moved to FCN and coded the network architecture from the paper and from the provided diagram and also from looking at some examples on github.
Its the data loading part that has me confused.
i understand that the Torch DataLoader will take a Dataset class that we have to write ourselves. The DataSet class has to return an image and its respective label from __getitem__(self, idx)
, returning the image is easy, its the label of said image is confusing me.
With MNIST each image had a simple label, but with semantic segmentation each image is divided into multiple colours and those colours represent the segmentation. How do I feed this into the network?
I am uisng the CamVid dataet and they provide the raw images, the coloured labeled images and a text file representing the colour of each label.
For example, if I have to use the image and its respective segmented image, I would write my DataSet class as:
class CamVid(Dataset):
def __init__(self, filenames, labels, root_dir, transform=None):
assert len(filenames) == len(labels) # if the two are not of equal length throw an error
self.filenames = filenames
self.labels = labels
self.root_dir = root_dir
self.transform = transform
def __len__(self):
return len(self.filenames)
def __getitem__(self, idx):
this_img = join(self.root_dir, self.filenames[idx]+'.png')
img = Image.open(this_img)
this_label = join(self.root_dir, self.labels[idx]+'.png')
label = Image.open(this_label)
if self.transform:
img = self.transform(img)
return [img, label]
This will return the image and its segmented counterpart, and I can feed this to my network, but I dont want to as I dont understand what the network will learn from this. I know I have to incorporate the label colour codes from the text file but dont know how and at what stage…
here is the .txt file with the colour codes:
64 128 64 Animal
192 0 128 Archway
0 128 192 Bicyclist
0 128 64 Bridge
128 0 0 Building
64 0 128 Car
64 0 192 CartLuggagePram
192 128 64 Child
192 192 128 Column_Pole
64 64 128 Fence
128 0 192 LaneMkgsDriv
192 0 64 LaneMkgsNonDriv
128 128 64 Misc_Text
192 0 192 MotorcycleScooter
128 64 64 OtherMoving
64 192 128 ParkingBlock
64 64 0 Pedestrian
128 64 128 Road
128 128 192 RoadShoulder
0 0 192 Sidewalk
192 128 128 SignSymbol
128 128 128 Sky
64 128 192 SUVPickupTruck
0 0 64 TrafficCone
0 64 64 TrafficLight
192 64 128 Train
128 128 0 Tree
192 128 192 Truck_Bus
64 0 64 Tunnel
192 192 0 VegetationMisc
0 0 0 Void
64 192 0 Wall
I’m really new to this and some guidance would be appreciated.
Many thanks