I am trying to set up a single class detection model. My goal is to detect drones. The only annotated images I have are images where a drone appears. Otherwise I have images without drone, but without annotations either.
I tried to train my network with two classes, 0 → background and 1-> drone.
But the pytorch model can’t take empty bbox in the case of background class, i can set the bbox as the size of the image but i don’t think that is a good solution.
My question is, is it good to only train my model with drone images and test/ evaluate with background/
no drone images ?
Can my model learn good features of it if I only train with one class ? I’m afraid that in inference my model confuse a drone with any flying object (bird,helicopter…)
It’s more a theorical question but i’m just a beginner i’m trying to understand what is best for this case.
Thanks a lot
No. As a general rule it is important to train your model with data that
is fully representative of the real-world data you will be feeding to your
model for inference.
To me, “detection” means (and I think this is the conventional usage)
locating zero, one, or more instances of an object of a given type within
the image being analyzed (or locate and identify objects of multiple
types if you are doing multi-class detection).
If you train your network on images that have exactly one drone, then
your model will likely misidentify any little wisp of cloud as a drone when
you analyze images with no drones. Similarly, it will likely miss a drone
when you analyze images with two drones.
(This is less likely to be the case if your detector is built with the older
sliding-window / binary-classifier architecture.)
If your model “detects” drones in images without any, you want to
penalize it for doing so – during training – so you want to have images
without drones in your training set.
If you only train on drones, your model will likely learn good features for
identifying flying “stuff” and misidentify birds and helicopters as drones.
So, ideally, you will want to train on images that have nothing, just birds,
just drones (sometimes more than one), birds and drones, etc.
That way, your model will learn good features that not only detect
flying stuff but also distinguish drones from other flying stuff (such
as birds and helicopters).
This seems like a problem with the model.
For example, pytorch’s torchvision Faster R-CNN model takes as its
training targets (annotations) a list of object instances in the image.
I would expect (but I haven’t checked) that this list can be empty.
I understand the need to train my model with images that do not contain my ‘drone’ class as well as to try to have a dataset as close to the real world as possible.
However, I think my problem lies in the model, since unfortunately a detection model trains with a loss function for classification AND a regression loss function for bounding boxes. So for the examples without class (with only background), I have to generate a bounding box. For me the most intuitive way is to put a bbox the size of my image… It’s not ideal, but I don’t see how to train my detection model well without a regression loss function for my background examples.
Thanks again for your answer