It just appears in my mind that if I’m working on an object detection task, and there are no objects in some of the training images, can this will work? If can, how does it work?
Yes, but how best to do this will depend on the overall structure of
First, to me, “object detection” means finding one or more objects in
an image (and reporting their locations, e.g., their bounding boxes).
This is often done by sliding a series “detection” windows across the
image and then running the sub-image in such a window through an
object classifier. I won’t say more about such full-scale object detectors
(used in this sense) other than to note that they are built out of smaller
If you have a three-class classification problem – e.g., this image
contains either a “person” or a “dog” or a "“shrubbery” – then, in
order to be able to accept “background-only” images, you would
add a fourth class, “background,” and train your classifier in the
usual way with all four classes.
If you have multi-label, three-class classification problem – e.g., this
image does or does not contain a “person” and also does or does
not contain a “dog” and also does or does not contain a “shrubbery”
(that is, your image can be “true” for any number of your three classes,
including for none of them or all of them) – then you don’t have to do
anything additional to account for training images containing no
objects (other than to make sure that they are labelled correctly).
Thanks a lot for your delicate reply~~ One more question about the implement detail, that should I annotate the “background-only” images? Or in other words, how does the “ground-truth” look like? An empty bbox list for object detection task? An “all-zeros” mask for segmentation task？Or whatsoever?
It really depends on the specific problem you are trying to solve.
Are you working on a single-image multi-class classifier? Are
you working on image segmentation? We can make suggestions
about such implementation details if you give us some concrete
details about the specific model you are trying to implement.
Well, okay, that’s my bad. (Actually, as I said at the very beginning, it just popped in my mind. )
We can suppose some scenarios:
- I’m trying to locate all cats and dogs in one image only with bboxes using faster-rcnn.
- I’m trying to segment all cars and persons in one image with masks using deeplabv3.
- I"m working on an instance segmentation using mrcnn.
Really grateful to your patience.
Hello Michael, It is possible to do that.
For your example 1:
In your dataset getitem method, you could set for images with background only:
box_count = 0 bboxes = np.array().reshape(-1, 4)
And convert them like any other image:
bboxes = torch.as_tensor(bboxes, dtype=torch.float32) labels = torch.ones((box_count,), dtype=torch.int64)
Okay~ thanks, I’ll try it.