I managed to properly train semantic segmentation models with pytorch (but not yet on Pascal VOC).
I’d say that you should print during training the confusion matrix of the training/validation examples that you have seen, and then tune the learning rate / other parameters inspecting the training / validation metrics.
One more thing, I have the impression that you are not initializing your networks with a network pre-trained on ImageNet, is that the case? Note that in the FCN paper, they mention that they completely failed to train the network from scratch.
I also tried to use UNet, sometimes the loss (BCELoss) gets nan value , and the training loss just becoms nan all along by then… what might cause this weird thing?(I just want Unet to predict a feature map with values varying from 0 to 1 so I use BCELoss)
@fmassa Yep, I do not use pre-trained VGG8 for FCN (and also no pre-trained ResNet for PSPNet, also I just see that SegNet wants pre-trained encoders too…). I will fix this next week.
What do you mean with confusion matrix? Does this make sense when you classify each pixel and not the complete image?
@brisker
I can’t speak for BCELoss but maybe you can compare your architecture with mine? I think I use nn.UpsampleBilinear2d over nn.ConvTransposed2d as recommended by lopuhin in another post here.
Yes, it still make sense to use the confusion matrix, and it’s also how it’s used to compute the intersection over union metric (which is usually used in semantic segmentation tasks).
@bodokaiser, Did you find what the problem is with your implementation ? I spent a week on your code and tried to get reasonable results from your UNet implementation, and the same blank prediction maps are the best I’ve got!!
On the hands, I have another Theano implementation of UNet which produces the claimed results. So, I tried to make the training stages equal for the both implementation (e.g. fixed data, fixed hyper-parameters and so on), but again blank maps with PyTorch!!
I finally found the problem!!
For the last set of convolutions, that is 128-> 64 -> 64 -> 1, the activation function should not be used!
The activation function causes the values to vanish!
I just removed the nn.ReLU() modules on top of these convolution layers and now everything works great!
Could the bad results be because you’re reconstructing the raw float tensor. Python represents the pixel values in [0, 1] while libraries like Caffe accept it in [0,255]. If that’s the case maybe multiplying the pixel values by a factor may help?
It’s a hunch because PSPNet, FCN and Segnet have all been built with Caffe.
@aicaffeinelife I don’t think it matters to much as inside the network you will need float tensors due to float arthimetics. However if you work with medical data (i.e. int16) you can’t directly cast to float16 because of precision loss.
Hi @Saeed_Izadi, could you elaborate please on exactly which convolutions you’re referring to and in which network? Are you referring to the networks from @bodokaiser or something else?
Hi @achaiah
Well I’m referring to the last two convolutional layers, specifically the convolutions with 64 Channel Imports inputs. And I’m talking about the unet model.
Yes
Let me know if you need more information on this
@achaiah
Yeah I tried about 12 segmentation methods, including PSPnet, on binary masks and all of them are working fine .
I can share my implementations with you if interested
Hey @Saeed_Izadi, if you could share your implementation that would be amazing because I’m looking at using something on binary masks myself but there aren’t that many clear examples out there.
Hi @achaiah. Thanks for your reminder. Honestly, I have a deadline on Aug. 27 ahead, and my code is somewhat messy! Would you please remind me after Aug. 27? You can contact me through email (sent to you)