Issue getting good mAP score for vehicle sub parts like fender on mobilnetv2 + SSD using pytorch

Hi Team,

I am working in SSD + MobilenetV2 architecture for detecting sub-parts of the vehicle like - Bumper, fender, Quarter Panel, etc…

I am using almost the same architecture given in git repo GitHub - qfgaohao/pytorch-ssd: MobileNetV1, MobileNetV2, VGG based SSD/SSD-lite implementation in Pytorch 1.0 / Pytorch 0.4. Out-of-box support for retraining on Open Images dataset. ONNX and Caffe2 support. Experiment Ideas like CoordConv.. I am able to identify objects like Front Bumper - 50% mAP and Rear Bumper - 90% mAP. But, for other especially front and rear Fenders(Where detecting distinct features are less compare to other sub-parts) is becoming challenging.

Best Average validation loss: 1.95 , regression loss : 0.45, classification ~ 1.5

Giving classwise mAP:
AP: 23.66% (door)
AP: 41.98% (frontBumper)
AP: 0.48% (frontFender)
AP: 38.65% (hood)
AP: 94.74% (rearBumper)
AP: 0.74% (rearFender)
mAP: 33.38%

Architecture Changes done -
commented section - first layer in classification and regression header:
classification_headers = ModuleList([
#SeperableConv2d(in_channels=round(576 * width_mult), out_channels=6 * num_classes, kernel_size=3, padding=1),
#SeperableConv2d(in_channels=1280, out_channels=6 * num_classes, kernel_size=3, padding=1),
SeperableConv2d(in_channels=512, out_channels=prior_count * num_classes, kernel_size=3, padding=1),
SeperableConv2d(in_channels=256, out_channels=prior_count * num_classes, kernel_size=3, padding=1),
SeperableConv2d(in_channels=256, out_channels=prior_count * num_classes, kernel_size=3, padding=1),
Conv2d(in_channels=64, out_channels=prior_count * num_classes, kernel_size=1),
]) regression_headers = ModuleList([
#SeperableConv2d(in_channels=round(576 * width_mult), out_channels=6 * 4,
#kernel_size=3, padding=1, onnx_compatible=False),
#SeperableConv2d(in_channels=1280, out_channels=6 * 4, kernel_size=3, padding=1, onnx_compatible=False),
SeperableConv2d(in_channels=512, out_channels=prior_count * 4, kernel_size=3, padding=1, onnx_compatible=False),
SeperableConv2d(in_channels=256, out_channels=prior_count * 4, kernel_size=3, padding=1, onnx_compatible=False),
SeperableConv2d(in_channels=256, out_channels=prior_count * 4, kernel_size=3, padding=1, onnx_compatible=False),
Conv2d(in_channels=64, out_channels=prior_count * 4, kernel_size=1),
])

If required I can share more details over the above info.

Took 4 priors: in generate_ssd_priors
specs = [
commented # SSDSpec(19, 16, SSDBoxSizes(60, 105), [2, 3]),
commented # SSDSpec(10, 32, SSDBoxSizes(105, 150), [2, 3]),
SSDSpec(5, 64, SSDBoxSizes(150, 195), [2, 3]),
SSDSpec(3, 100, SSDBoxSizes(195, 240), [2, 3]),
SSDSpec(2, 150, SSDBoxSizes(240, 285), [2, 3]),
SSDSpec(1, 300, SSDBoxSizes(285, 330), [2, 3]),
]

As part of hyper parametrs:

used :
multistep lr - 80,100,120,150
batch : size128
validation check after : 5 epoch
base net : mobilenetv2
SGD = momentum used = 0.9 and 0.5
not freezing any layer hence fine tuning from scratch

Data augmentation: Geometrix expansion, random image crop. Avoided photometric changes since not having enough features for some sub-parts. Also not doing Randomirror.

Our Objective is to not increase the size of a model to more than 12-13 MB but want to attain at least 75% mAP for all classes so that we can launch the model on our auto platform. Your suggestions will be welcome a lot!!

Kindly help. We need to push it to a production-ready platform.