Force single blob segmentation in U-Net model

I have a vanilla U-Net model for 2 classes image segmentation.
The classes are background and a single type of object.

The object is always a connected set of pixels.
As a binary mask it is a single blob of pixels with no holes in it.

I wonder if there is a loss function to force such shape on the model output.
As for now the output I get is sometimes several blobs.

I currently use the Tversky loss.

I thought about adding a loss based on the maximum and minimum coordinates of the target blob and the output blob. Enforcing the rectangle. Is there a gradient friendly to do so?

Hi Avi!

As you describe it, you might choose to treat your problem as one
of instance segmentation, rather than just semantic segmentation.

I don’t think that there is a natural loss function* for this kind of thing.

However, you might be able to contrive something. For example, you
might say that if a pixel is predicted to be a background pixel, but some
of the pixels on either side of it are foreground pixels, it should be given
some additional penalty for being classified as background. But this is
exactly the kind of thing – the status of adjacent pixels – that should be
able to be learned by a semantic-segmentation model (such as U-Net).

Is your use case such that each image has exactly one blob in it?
If so, predicting a single blob might be as simple as closing (dilating
and then eroding) the prediction of your U-Net. (Note, that although
quite simple, this is a version of semantic segmentation followed by
post-processing.)

If a single image can contain multiple separate blobs, you’re really
now looking at an instance segmentation problem and you should
consider instance-segmentation algorithms.

If the individual blobs are well-separated, then post-processing with
a closing operation could still work well.

Note that there is a class of instance-segmentation algorithms
that rely on “enhanced” semantic segmentation followed by
post-processing (for example, StarDist, which uses a U-Net for
its initial semantic-segmentation).

I would recommend using BCEWithLogitsLoss (possibly with
pos_weight, if you have a significant foreground-background
imbalance) and only add something like Tversky loss to it if you
can show that doing so improves performance.

*) You could look a adding a topological-loss term to your overall
loss, but this might be overkill, and is something of a nuisance to
implement.

Best.

K. Frank

1 Like

Indeed, in my case it is assured there is a single blob.
Are there models specialized in a single instance segmentation?

Basically I’d like o to have a bounding box.
Yet I don’t want a fully featured object detection model with anchors and grid.
Just a simple loss that will also punish for the bounding box being different than the bounding box of the reference mask.

Hi Avi!

If you know that each image contains exactly one blob (that is, one
instance), then semantic segmentation is, by definition, instance
segmentation. So a semantic-segmentation model (such as U-Net)
is just what you want.

If you get your predicted pixels right, the bounding box that you trivially
compute from those pixels will be right. So, focus on getting your
semantic-segmentation to work well.

I don’t understand why you are so focused on the bounding box. It’s
not really fundamental – it’s more a technical ingredient of certain
detection and instance-segmentation algorithms.

Suppose that the bounding box you compute from your predicted pixels
is, say, two or three pixels too high (but that the predicted pixels are
altogether quite accurate). Now suppose that you add a bounding-box
loss term to your overall loss and now your predicted bounding box is
exactly right, but, say, 500 pixels (that all happen to lie within the
correct bounding box) are predicted incorrectly. Is that really a better
outcome for your use case?

When you change your loss to make some aspect of your predictions
better, you will – all else being equal – make some other aspect of your
predictions worse. After all, you’ve told your model – via your choice of
loss function – that you care more about the precise bounding box,
rather than the specific pixels.

Best.

K. Frank

@KFrank , the problem I face is I have many small blobs outside of the area of the object.
Those are false alarms.

If I make the model limit itself to a rectangle it will mitigate the issue.
I don’t want a fully featured object detector, just a smart and effective way to connect the bounding box of the prediction to the bounding box of the ground truth labels.

Is there a way to define such loss which is gradient friendly?
Given the output mask of the model, how can I build a regression model for the bounding box?

Hi Avi!

Well, you haven’t told us anything about your actual use case, so it’s
hard to say …

It sounds like you have a large “blob” (whatever that means in your
actual use case …) that you wish to segment as foreground and
several smaller blobs, which, other than their sizes, look similar to the
large blob, that you wish to segment as background.

Note that a U-Net model has a “field of view.” That is, any single pixel
in the output depends only on the input pixels that are in the output
pixel’s field of view. For example, roughly speaking, if your U-Net
has five layers that downsample by a factor of two, any given output
pixel will only depend on a 32x32 patch of input pixels centered on
the output pixel.

If your field of view is not large enough to more-or-less contain the
largest of your “small” background blobs (and, except for their sizes,
foreground blobs and background blobs look the same), then your
U-Net won’t have enough information in it’s field of view to distinguish
small blobs from large.

You can increase the field of view of your U-Net by adding more
downsampling layers or by downsampling by a larger factor in one or
more of your layers. For example, if you downsampled by a factor of
four in five layers, your field of view would become, roughly speaking,
1024x1024.

The standard approach for regressing bounding boxes is to use fixed
anchors and regress differentiable offsets. I don’t know of a slick way
of doing anything simpler.

But it sounds like you are trying to reinvent the wheel. Whether your
problem is technically instance segmentation, with your small blobs
vs. large blobs, it is certainly similar. So if you want something akin to
instance segmentation and you want to regress bounding boxes, you
ought to look at bounding-box-based instance segmentation models.
Mask R-CNN is one such.

Best.

K. Frank

1 Like

@KFrank ,

Appreciate the good dialogue.

I am aware of the concept of receptive field in CNN.
I don’t think it is the concept in my case.

Problem wise, you may think I want to segment a single object within a frame.
Namely, I know in prior it appears once in the image.

An example, segment a t-shirt in an image.
The shirt only happens once in the image.

Yet the U-Net segment most of yet also small, false alarm, blobs in the background.

I thought about anchors.
Can I set a single one which is the center of mass of the largest blob of the U-Net output?

The motivation is to take advantage of the strong prior I have:

  1. The blob appears only once in each image.
  2. The object to segment is a connected set.

Taking advantage of those 2 assumption should make results better.