Hi, i have been developing an object detection model using pytorch, I have 5.000 images of 1 class, there is actually only 1 class that I want to detect via camera frames.
When I train it 200-300 epoch and the loss stops decreasing, I let it inference with camera, it is very good to detect wanted objects, however It makes false positive predictions with 0.99 confidence on very un-related objects, the objects generally one of the RGB, so they are generally full red or green or blue objects.
I have take photos from this object and augment them with my wanted objects, label them and add to the dataset, however it did not work, same situation continues.
1)The augmented image resolution is 1280p but my camera feed gives 480p, can this be the reason?
2)Some of the slightly blurred images from my dataset can cause this?
Wish to have good days.
How did the validation loss look during training? Did you see some overfitting observing the training and validation loss curves?
Also, does your training and validation set contain similar images with these specific colors?
Validation loss is fluctuating, first 20 epochs it is decreasing then fluctuating but decreasing is continuous.
Val loss always a bit lower than training loss, i think it is because of augmentation.
Yes, it is the most interesting part, i have added the photos of the problematic objects to the training dataset, when i show them as a test input, it does not say it positive, but if i show them with my camera it gives false positives also it say positive to the full black area when i close the camera lens , i am really confusing.
If I understand the follow-up correctly, you are seeing a different behavior depending if you are feeding the object images through your “training pipeline” or through your camera?
If that’s the case, then this might be expected, as I’ve worked with models in the past, which were even sensitive to different JPEG decoding setups.
Could you create a (small) dataset captured by your camera, label it, and add to the training and validation set? The difference in the sample domains (training images vs. camera capture) might introduce enough signal, that your model is unable to properly predict the camera images.
Thank you sir , i will try to capture images with my camera and create dataset with them.