How relevant are negative examples for a Unet segmentation model?

Hello, I am working with a Unet segmentation model for medical image analysis, and I would like to know how important are negative examples (with empty masks) for my model to learn that some images are 100% negatives. I am asking this because I took a bunch of negative and added to my dataset, some kind of hard negative mining, and still, I am getting a lot of false positives. Does Unet learn anything with negative examples? Is there any other way to force my model to learn these ‘negative’ features?

My dataset is very skewed, I am working with medical image and I want to segment little specimens in 256x256 patches. The number of negative patches (no specimen) and the number of positive patches are balanced but the number of positive positive pixels is 1-2%.
I read that weighted sigmoid cross-entropy might be a good idea to attenuate this difference and get a lower rate of false positives.

I would like to get your opinion

If you could also provide relevant information about this (articles, papers, questions) I would highly appreciate.

I saw your issue on a Pytorch U-Net implementation (https://github.com/milesial/Pytorch-UNet/issues/126), I have the exact same problem with U-Net! Atm i’m getting a fairly high dice coefficient (> 0.8), but poor predicted segmentation masks. I looked through the mask predictions and negative samples are definitely skewing the accuracy evaluation & learning process!

How balanced is your dataset?

Approx 3:2 biased towards negative samples. What about you? I’m thinking about trying with torch.utils.data.sampler.WeightedRandomSampler to counteract the dataset imbalance problem. However, I’m not sure dice coefficient is the most suitable evaluation metric (it was not specified in the paper).

I have a very skewed dataset, 1:99 and adding negative examples is not alleviating the false positive rate.

I think my problem is the balancing I am doing is too favorable to the positive class. Maybe I need to modify the weights of each weights to each class. (Actually ironically I can be increasing the false positive rate by adding more negative examples with my balancing equations).

I’m not too sure how I can solve my issue, but to control the false positive rate, one can actually tweak the output threshold for the FCN. To reduce false positives, perhaps raising the threshold to above 0.5 would help.