Need advice training object detection

i have 433 image dataset which consists of class (multiple class can be inside one image):
‘a’: 634,
‘b’: 501,
‘c’: 590,
‘d’: 293,
‘e’: 524,
‘f’: 262
i know my data is too small so i added several image augmentation technique and i get 2598 images without original image:
‘a’: 3804,
‘b’: 3006,
‘c’: 3540,
‘d’: 1757,
‘e’: 3142,
‘f’: 1569

i’m using SSD as detector and mobilenet v1 as backbone based on this repo:

in ssd there are 2 loss function , regression with smoothL1 and classification with cross entropy,
i trained with my dataset with 80% training and 20% testing, i forgot to plot the training and validation loss but i’m sure i don’t overfit my model, i stopped training when the loss function increase, i get around 0.5 for regression loss and 1.5 for classification loss, i trained without pretrained model because that repo only provide voc pretrained model.
when i try to test my model with live cam the accuracy of detection is not good but can detect object with probability below 0.5, and cannot detect with little oclussion like when i grab the object with my hand

here mAP i get with confidence 0.5

mAP 33.68%:

'a': 0.38, 
'b': 0.45, 
'c': 0.27, 
'd': 0.65, 
'e': 0.16, 
'f': 0.11

i want to ask what should i do to improve my model, did i wrong with handling the dataset? thank you

Are you including random jittering, random crop, random rotation, random Gaussian noise to help your model learned better?

Hi @Scott_Hoang, yes i’m using imageaug lib with this augmentation i used :

aug = iaa.SomeOf(2, [
                iaa.AdditiveGaussianNoise(scale=(0,0.1*255)),
                iaa.Add((-40,40), per_channel=True),
                iaa.Sharpen(alpha=(0, 0.5)),
                iaa.Dropout(p=(0,0.15)),
                iaa.GaussianBlur(sigma=(0.0, 1.5)),
                iaa.MotionBlur(k=blur)
            ])

and combine with

random_shear
random_rotate
random_scale
random_horizontal_flip
random_vertical_flip

also my objects are diverse, small but not so small and large object

Hi. Since you are not using the pre-trained model weights as a starting point, that’s a possible reason for this performance. There are 2 options here:

  1. You can use the pre-trained model weights and freeze some backbone layers and do a transfer learning with your images. You can do that without even freezing the layers to try it out.
  2. Second would be to increase your dataset size if you don’t want to use the pre-trained model weights and rather train from scratch.
    Also to handle for cases like occlusion, try to add more data that represents those cases or use augmentations like CutMix, Mosaic etc. which should probably help.
    Hope this helps.