Semantic segmentation perform bad

bodokaiser · April 16, 2017, 6:36am

For piwise I ported various models from caffe prototxt to pytorch including FCN, Segnet, UNet and PSPNet.

After training them about 30-40 epochs on the PASCAL VOC12 segmentation dataset I get following results:

FCN8

PSPNet

UNet

SegNet

So PSPNet, FCN8 just output black images. UNet, SegNet give some results but are still far away from the results claimed in their paper.

Links to networks and training.

Any suggestions what may be the problem or what I could try next?

fmassa · April 16, 2017, 10:10am

I managed to properly train semantic segmentation models with pytorch (but not yet on Pascal VOC).
I’d say that you should print during training the confusion matrix of the training/validation examples that you have seen, and then tune the learning rate / other parameters inspecting the training / validation metrics.

One more thing, I have the impression that you are not initializing your networks with a network pre-trained on ImageNet, is that the case? Note that in the FCN paper, they mention that they completely failed to train the network from scratch.

brisker · April 16, 2017, 1:08pm

I also tried to use UNet, sometimes the loss (BCELoss) gets nan value , and the training loss just becoms nan all along by then… what might cause this weird thing?(I just want Unet to predict a feature map with values varying from 0 to 1 so I use BCELoss)

bodokaiser · April 16, 2017, 1:25pm

@fmassa Yep, I do not use pre-trained VGG8 for FCN (and also no pre-trained ResNet for PSPNet, also I just see that SegNet wants pre-trained encoders too…). I will fix this next week.

What do you mean with confusion matrix? Does this make sense when you classify each pixel and not the complete image?

@brisker
I can’t speak for BCELoss but maybe you can compare your architecture with mine? I think I use nn.UpsampleBilinear2d over nn.ConvTransposed2d as recommended by lopuhin in another post here.

brisker · April 16, 2017, 1:27pm

Why use nn.UpsampleBilinear2d instead of nn.ConvTransposed2d?

fmassa · April 17, 2017, 7:56pm

Yes, it still make sense to use the confusion matrix, and it’s also how it’s used to compute the intersection over union metric (which is usually used in semantic segmentation tasks).

Saeed_Izadi · July 20, 2017, 11:13pm

Hi there,

@bodokaiser, Did you find what the problem is with your implementation ? I spent a week on your code and tried to get reasonable results from your UNet implementation, and the same blank prediction maps are the best I’ve got!!
On the hands, I have another Theano implementation of UNet which produces the claimed results. So, I tried to make the training stages equal for the both implementation (e.g. fixed data, fixed hyper-parameters and so on), but again blank maps with PyTorch!!

Saeed_Izadi · July 26, 2017, 2:57am

I finally found the problem!!
For the last set of convolutions, that is 128-> 64 -> 64 -> 1, the activation function should not be used!
The activation function causes the values to vanish!

I just removed the nn.ReLU() modules on top of these convolution layers and now everything works great!

Saeed

aicaffeinelife · August 2, 2017, 11:14pm

Hi there,

Could the bad results be because you’re reconstructing the raw float tensor. Python represents the pixel values in [0, 1] while libraries like Caffe accept it in [0,255]. If that’s the case maybe multiplying the pixel values by a factor may help?

It’s a hunch because PSPNet, FCN and Segnet have all been built with Caffe.

bodokaiser · August 3, 2017, 7:24am

@aicaffeinelife I don’t think it matters to much as inside the network you will need float tensors due to float arthimetics. However if you work with medical data (i.e. int16) you can’t directly cast to float16 because of precision loss.

aicaffeinelife · August 7, 2017, 11:01pm

@bodokaiser I was referring to the place where the bilinear upsampling takes place. I’ll take a look at it and get back to this thread in some time.

achaiah · August 11, 2017, 4:35pm

Hi @Saeed_Izadi, could you elaborate please on exactly which convolutions you’re referring to and in which network? Are you referring to the networks from @bodokaiser or something else?

Saeed_Izadi · August 11, 2017, 9:43pm

Hi @achaiah
Well I’m referring to the last two convolutional layers, specifically the convolutions with 64 Channel Imports inputs. And I’m talking about the unet model.
Yes
Let me know if you need more information on this

achaiah · August 11, 2017, 9:45pm

Ok thanks for the clarification. Out of curiosity, have you tried PSPNet at all?

Saeed_Izadi · August 11, 2017, 10:34pm

@achaiah
Yeah I tried about 12 segmentation methods, including PSPnet, on binary masks and all of them are working fine .
I can share my implementations with you if interested

achaiah · August 13, 2017, 3:57am

Hey @Saeed_Izadi, if you could share your implementation that would be amazing because I’m looking at using something on binary masks myself but there aren’t that many clear examples out there.

Thanks!

Saeed_Izadi · August 13, 2017, 4:09am

@achaiah hey
Do you mind if I share my implementations in a couple of days ?

achaiah · August 13, 2017, 4:30am

Yes, of course, please share when convenient.

achaiah · August 21, 2017, 9:56pm

Hi @Saeed_Izadi, just figured I’d drop a friendly reminder here in case you get a chance to share your implementation.

Thanks

Saeed_Izadi · August 23, 2017, 7:50am

Hi @achaiah. Thanks for your reminder. Honestly, I have a deadline on Aug. 27 ahead, and my code is somewhat messy! Would you please remind me after Aug. 27? You can contact me through email (sent to you)

Saeed