Semantic segmentation perform bad

(Bodo Kaiser) #1

For piwise I ported various models from caffe prototxt to pytorch including FCN, Segnet, UNet and PSPNet.

After training them about 30-40 epochs on the PASCAL VOC12 segmentation dataset I get following results:





So PSPNet, FCN8 just output black images. UNet, SegNet give some results but are still far away from the results claimed in their paper.

Links to networks and training.

Any suggestions what may be the problem or what I could try next?

(Francisco Massa) #2

I managed to properly train semantic segmentation models with pytorch (but not yet on Pascal VOC).
I’d say that you should print during training the confusion matrix of the training/validation examples that you have seen, and then tune the learning rate / other parameters inspecting the training / validation metrics.

One more thing, I have the impression that you are not initializing your networks with a network pre-trained on ImageNet, is that the case? Note that in the FCN paper, they mention that they completely failed to train the network from scratch.


I also tried to use UNet, sometimes the loss (BCELoss) gets nan value , and the training loss just becoms nan all along by then… what might cause this weird thing?(I just want Unet to predict a feature map with values varying from 0 to 1 so I use BCELoss)

(Bodo Kaiser) #4

@fmassa Yep, I do not use pre-trained VGG8 for FCN (and also no pre-trained ResNet for PSPNet, also I just see that SegNet wants pre-trained encoders too…). I will fix this next week.

What do you mean with confusion matrix? Does this make sense when you classify each pixel and not the complete image?

I can’t speak for BCELoss but maybe you can compare your architecture with mine? I think I use nn.UpsampleBilinear2d over nn.ConvTransposed2d as recommended by lopuhin in another post here.


Why use nn.UpsampleBilinear2d instead of nn.ConvTransposed2d?

(Francisco Massa) #6

Yes, it still make sense to use the confusion matrix, and it’s also how it’s used to compute the intersection over union metric (which is usually used in semantic segmentation tasks).

(Saeed Izadi) #7

Hi there,

@bodokaiser, Did you find what the problem is with your implementation ? I spent a week on your code and tried to get reasonable results from your UNet implementation, and the same blank prediction maps are the best I’ve got!!
On the hands, I have another Theano implementation of UNet which produces the claimed results. So, I tried to make the training stages equal for the both implementation (e.g. fixed data, fixed hyper-parameters and so on), but again blank maps with PyTorch!!

[resolved] Risk of bug in PyTorch! Weird performance of PyTorch vs. Theano
(Saeed Izadi) #8

I finally found the problem!!
For the last set of convolutions, that is 128-> 64 -> 64 -> 1, the activation function should not be used!
The activation function causes the values to vanish!

I just removed the nn.ReLU() modules on top of these convolution layers and now everything works great!


(Ankit ) #9

Hi there,

Could the bad results be because you’re reconstructing the raw float tensor. Python represents the pixel values in [0, 1] while libraries like Caffe accept it in [0,255]. If that’s the case maybe multiplying the pixel values by a factor may help?

It’s a hunch because PSPNet, FCN and Segnet have all been built with Caffe.

(Bodo Kaiser) #10

@aicaffeinelife I don’t think it matters to much as inside the network you will need float tensors due to float arthimetics. However if you work with medical data (i.e. int16) you can’t directly cast to float16 because of precision loss.

(Ankit ) #11

@bodokaiser I was referring to the place where the bilinear upsampling takes place. I’ll take a look at it and get back to this thread in some time.


Hi @Saeed_Izadi, could you elaborate please on exactly which convolutions you’re referring to and in which network? Are you referring to the networks from @bodokaiser or something else?

(Saeed Izadi) #13

Hi @achaiah
Well I’m referring to the last two convolutional layers, specifically the convolutions with 64 Channel Imports inputs. And I’m talking about the unet model.
Let me know if you need more information on this


Ok thanks for the clarification. Out of curiosity, have you tried PSPNet at all?

(Saeed Izadi) #15

Yeah I tried about 12 segmentation methods, including PSPnet, on binary masks and all of them are working fine .
I can share my implementations with you if interested


Hey @Saeed_Izadi, if you could share your implementation that would be amazing because I’m looking at using something on binary masks myself but there aren’t that many clear examples out there.


(Saeed Izadi) #17

@achaiah hey
Do you mind if I share my implementations in a couple of days ?


Yes, of course, please share when convenient.


Hi @Saeed_Izadi, just figured I’d drop a friendly reminder here in case you get a chance to share your implementation. :slight_smile:


(Saeed Izadi) #20

Hi @achaiah. Thanks for your reminder. Honestly, I have a deadline on Aug. 27 ahead, and my code is somewhat messy! Would you please remind me after Aug. 27? You can contact me through email (sent to you)