Hi Avi!
Well, you haven’t told us anything about your actual use case, so it’s
hard to say …
It sounds like you have a large “blob” (whatever that means in your
actual use case …) that you wish to segment as foreground and
several smaller blobs, which, other than their sizes, look similar to the
large blob, that you wish to segment as background.
Note that a U-Net model has a “field of view.” That is, any single pixel
in the output depends only on the input pixels that are in the output
pixel’s field of view. For example, roughly speaking, if your U-Net
has five layers that downsample by a factor of two, any given output
pixel will only depend on a 32x32 patch of input pixels centered on
the output pixel.
If your field of view is not large enough to more-or-less contain the
largest of your “small” background blobs (and, except for their sizes,
foreground blobs and background blobs look the same), then your
U-Net won’t have enough information in it’s field of view to distinguish
small blobs from large.
You can increase the field of view of your U-Net by adding more
downsampling layers or by downsampling by a larger factor in one or
more of your layers. For example, if you downsampled by a factor of
four in five layers, your field of view would become, roughly speaking,
1024x1024.
The standard approach for regressing bounding boxes is to use fixed
anchors and regress differentiable offsets. I don’t know of a slick way
of doing anything simpler.
But it sounds like you are trying to reinvent the wheel. Whether your
problem is technically instance segmentation, with your small blobs
vs. large blobs, it is certainly similar. So if you want something akin to
instance segmentation and you want to regress bounding boxes, you
ought to look at bounding-box-based instance segmentation models.
Mask R-CNN is one such.
Best.
K. Frank