Generating Pascal VOC 2012 dataset

keyur_paralkar · September 6, 2018, 10:02am

Hello community,
I am trying to implement Fully convolution Networks (FCN) for semantic segmentation task on Pascal VOC 2012 dataset. I am having trouble regarding loading of dataset. My doubts are as follows:

Since in FCN-32 we get the output with dimension as = H x W x num_classes. Does this mean I have to convert my ground-truth segmentation maps to H x W x num_classes ? If yes, how can I generate one-hot encoded ground-truth images ?
What is the loss function that is used in such type of task apart from IOU, can categorical crossentropy be used in such situation ?

lxtGH · September 6, 2018, 11:42am

1, No, Load ground truth in H * W and let your network output H * W * num_classes
2, cross entroy loss can be OK(dense pixel-level classification problems)

keyur_paralkar · September 6, 2018, 11:55am

But if I keep network output to be H x W x num_classes and ground truth to be H x W it gives me error that y_pred should be of same shape as y_target

keyur_paralkar · September 7, 2018, 1:07pm

@lxtGH Thanks for the advice it worked. Now I was able to run the entire network, but I was getting a very high loss (~3.0 to 1.8). Also since the output is of shape = H * W * num_classes, how should I plot this prediction to visualize my predictions?

lxtGH · September 10, 2018, 8:41am

First cat H * W * num_classes in H * W map by argmax, according the definition of semantic segmentation, each pixel represent a class, your can put each pixel with different color(RGB), one color represent a class

keyur_paralkar · September 10, 2018, 10:53am

So does this mean applying argmax on the ‘num_class’ dimension to get H*W image and then convert it into RGB image ?

lxtGH · September 10, 2018, 10:55am

yes，you are right. Put the color according to the num_class

keyur_paralkar · September 10, 2018, 11:06am

@lxtGH but argmax will give only 1 Max value among all 21 values. How would I arrange them ?

keyur_paralkar · September 18, 2018, 1:23am

@lxtGH I used res.argmax(-1) given that res is my prediction of 21 channels. The outputt generated from res.argmax(-1) is 0 and get the output in following way:

This is my model structure for FCN-32:

These are my hyper-parameter settings:

Have I build the model correctly? why am I getting 0 predictions?

kaixin · September 18, 2018, 8:59am

How many iterations have you trained the model? I was once in the similar situation but after enough epochs, the model started to output segmentations.

keyur_paralkar · September 18, 2018, 11:32am

@kaixin I am running this model for 1 to 4 epochs, should I increase it to more epochs. What might be optimal number of epochs to train ?

kaixin · September 18, 2018, 1:16pm

@keyur_paralkar The number of epochs mentioned in the paper is 175. Try 50 or more, I think you will get something rather than a blank mask.

keyur_paralkar · September 19, 2018, 1:58am

@kaixin and @lxtGH Thank you guys I am able to get predictions with FCN-32 model now.