Segmentation question


I am reading about CNN segmentation architectures and I am confused about the output layer. Most of them uses 2 layers as the final output one for background and one for foreground probability. Doesn’t it make sense just to have one output layer for the foreground pixels? The probability of the background pixels would be 1-Pf right? Am i missing something? Is there any advantages of having 2 output layers?


Both work. If you want a 1 dimensional output you save a few parameters and would be doing binary classification on each pixel. You would need to use binary cross entropy as your loss function and choose a threshold for your final predictions. With a >2 dim output you can use a softmax across the channel dimension and use cross entropy, you probably see this more because it generalizes to multiple classes (segmenting cars, tree, road, and background) at once.

1 Like