Combining Segmentation Predictions from Multiple Networks

For Defect Segmentation I have trained a distinct segmentation network for each defect. I.e. 20 classes -> 20 models where the output of each model is a sigmoid activated segmentation-map of shape Bx1xHxW.

Now I want to fuse all predictions in a single segmentation map of shape Bx20xHxW.

Here the main issue is, that the network predictions are always close to 1 (0.97-0.999), even for false positives which makes a pixel-wise class decision impossible.

I already tried to use negative examples from other classes during training to lower the confidence score in case of false positives but this didn’t change much.

Any Ideas how I can improve my training procedure such that the network outputs are more evenly distributed between 0-1? The single-network-per-class unfortunately is a requirement.

I assume you’ve trained each model in a binary fashion, i.e. each model only predicts its corresponding class as the positive class and all other classes as the negative class?
If so, how is the performance of each model (confusion matrix)?

Exactly.

Until now I only evaluated the dice-score which floats around 0.8 for most defects but can be lower or higher.

What additional information do you think can the confusion matrix tell me here? I am fully aware that false positives happen, I just want their confidence to be lower than the correct predictions.