Calibrating probability output and threshold

So two things:

  • In my experience, the first thing to do is balancing the pixels more. For example, when training the U-Net for nodule detection in our book (a very imbalanced problem), we take care that we have enough slices with nodules fed into the U-Net. A brief discussion is in section 13.5.5 (Designing our training and validation data). We didn’t write an entire paper about it, but in our situation it went from “doesn’t learn” (even with weighted dice) to “learns”.
  • If you have some holdout data, it is very reasonable to adjust the threshold, but if you trained your model well, it probably matters less, as UNet class probabilities tend to exhibit the same overconfidence thing known from regular classification to some degree (i.e. it’s either almost zero or almost one most of the time).

Best regards

Thomas