Implementation of Object Detection Training on Custom Dataset

I am training object detectors (Faster RCNN and RetinaNet) for a custom image dataset. I am facing problems with empty/garbage output for the trained detector. I would appreciate any help in resolving these issues.

  • Data: RGB Images of size 3040 x 4048 x 3
  • Task: Detection of a single type of object in the images
  • Model:
    • Retinanet: torchvision.models.detection.retinanet_resnet50_fpn(pretrained=False, pretrained_backbone=True, num_classes=2)
    • Faster RCNN:
      model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
      num_classes = 2  # 1 class (person) + background
      in_features = model.roi_heads.box_predictor.cls_score.in_features
      model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
  • Problems:
    • The Faster RCNN produces bounding boxes that are very small, and always to the top left of the image - they never contain the object.
    • The RetinaNet produces empty predictions - the “boxes” variable in the output dictionary is a [0x4] tensor
  • Questions:
    • Why is the Faster RCNN producing garbage boxes even if the loss is decreasing at the time of training?
    • Why is the RetinaNet model producing empty boxes even before training? What can I do to solve these issues?
    • How should the model be changed to accommodate the larger size of the images compared to the standard image datasets? Is this needed?
  • Remark: The RetinaNet model produces empty output even before training.

Please let me know if any details of the implementation will help in identifying the problem. Thank you in advance for your help.

1 Like

Hi! I’m trying to do the object detection task. Can you share your code or point me to a good tutorial?