Generating Pascal VOC 2012 dataset

Hello community,
I am trying to implement Fully convolution Networks (FCN) for semantic segmentation task on Pascal VOC 2012 dataset. I am having trouble regarding loading of dataset. My doubts are as follows:

  1. Since in FCN-32 we get the output with dimension as = H x W x num_classes. Does this mean I have to convert my ground-truth segmentation maps to H x W x num_classes ? If yes, how can I generate one-hot encoded ground-truth images ?
  2. What is the loss function that is used in such type of task apart from IOU, can categorical crossentropy be used in such situation ?

1, No, Load ground truth in H * W and let your network output H * W * num_classes
2, cross entroy loss can be OK(dense pixel-level classification problems)

1 Like

But if I keep network output to be H x W x num_classes and ground truth to be H x W it gives me error that y_pred should be of same shape as y_target

@lxtGH Thanks for the advice it worked. Now I was able to run the entire network, but I was getting a very high loss (~3.0 to 1.8). Also since the output is of shape = H * W * num_classes, how should I plot this prediction to visualize my predictions?

First cat H * W * num_classes in H * W map by argmax, according the definition of semantic segmentation, each pixel represent a class, your can put each pixel with different color(RGB), one color represent a class

So does this mean applying argmax on the ‘num_class’ dimension to get H*W image and then convert it into RGB image ?

yes,you are right. Put the color according to the num_class

@lxtGH but argmax will give only 1 Max value among all 21 values. How would I arrange them ?

@lxtGH I used res.argmax(-1) given that res is my prediction of 21 channels. The outputt generated from res.argmax(-1) is 0 and get the output in following way:

This is my model structure for FCN-32:

These are my hyper-parameter settings:

Have I build the model correctly? why am I getting 0 predictions?

How many iterations have you trained the model? I was once in the similar situation but after enough epochs, the model started to output segmentations.

@kaixin I am running this model for 1 to 4 epochs, should I increase it to more epochs. What might be optimal number of epochs to train ?

@keyur_paralkar The number of epochs mentioned in the paper is 175. Try 50 or more, I think you will get something rather than a blank mask.

1 Like

@kaixin and @lxtGH Thank you guys I am able to get predictions with FCN-32 model now.