i have 433 image dataset which consists of class (multiple class can be inside one image):
‘a’: 634,
‘b’: 501,
‘c’: 590,
‘d’: 293,
‘e’: 524,
‘f’: 262
i know my data is too small so i added several image augmentation technique and i get 2598 images without original image:
‘a’: 3804,
‘b’: 3006,
‘c’: 3540,
‘d’: 1757,
‘e’: 3142,
‘f’: 1569
i’m using SSD as detector and mobilenet v1 as backbone based on this repo:
in ssd there are 2 loss function , regression with smoothL1 and classification with cross entropy,
i trained with my dataset with 80% training and 20% testing, i forgot to plot the training and validation loss but i’m sure i don’t overfit my model, i stopped training when the loss function increase, i get around 0.5 for regression loss and 1.5 for classification loss, i trained without pretrained model because that repo only provide voc pretrained model.
when i try to test my model with live cam the accuracy of detection is not good but can detect object with probability below 0.5, and cannot detect with little oclussion like when i grab the object with my hand
Hi. Since you are not using the pre-trained model weights as a starting point, that’s a possible reason for this performance. There are 2 options here:
You can use the pre-trained model weights and freeze some backbone layers and do a transfer learning with your images. You can do that without even freezing the layers to try it out.
Second would be to increase your dataset size if you don’t want to use the pre-trained model weights and rather train from scratch.
Also to handle for cases like occlusion, try to add more data that represents those cases or use augmentations like CutMix, Mosaic etc. which should probably help.
Hope this helps.