FasterRCNN - images with no objects present cause an error

I’m trying to follow [this]( :vertical_traffic_light: Traffic Light Detection | Pytorch Starter | Kaggle) example to train a FasterRCNN object detection model. A large number of the images in the distribution have no object of interest, and therefore no annotation. However, data augmentation using Albumentations

  File "/usr/local/lib/python3.6/dist-packages/albumentations/augmentations/bbox_utils.py", line 330, in check_bbox
    "to be in the range [0.0, 1.0], got {value}.".format(bbox=bbox, name=name, value=value)
ValueError: Expected x_min for bbox (tensor(nan), tensor(nan), tensor(nan), tensor(nan), tensor(-9223372036854775808)) to be in the range [0.0, 1.0], got nan.

Any suggestions for how best to represent cases where there are no objects of interest?

Would it be possible to filter out these “empty” images beforehand or do they still contain another target value?

@ptrblck thank you for your response! I would have to include these empty images, since the distribution has a larger number of such images that have no target objects in them. So I feel that I should keep them in my train/val sets.

I have the same problem.
As a temporary solution i set a label 0 for background and the bbox is the size of my image when i have no object to detect… But i think that’s not a good solution. (An empty bbox or bbox = [0,0,0,0] return an error)

I also need images with no target because that represents a major part of my dataset.

Solved this by replacing with a tensor of shape [0,4]. So, for example:

    if np.isnan((target['boxes']).numpy()).any() or target['boxes'].shape == torch.Size([0]):
        target['boxes'] = torch.zeros((0,4),dtype=torch.float32)
1 Like

is it possible to validate/test on images with no targets?

I don’t know what “validate” would mean in this context, if no targets are given. Could you explain your use case a bit more?

Yes apologies, I meant validation and/or testing phase (basically the same thing).

I followed PyTorch’s tutorial with faster-rcnn. I plan to train on images that only contain objects, although out of interest, I just tried training an object detector with no objects. It exited swiftly as the loss was nan.

I want to test and evaluate on images that also include no targets. I’ve tried it right now and it appears to work. I’ve fed in images with no bounding box information (they’re basically nan values in place of x1, y1 etc.). However, I want to make sure that this is correct behaviour and that I’m not creating something that will lead to problems later.

I plan on keeping tabs on the TPs, FPs and FNs for future evaluation.

During the validation and testing phase you would call model.eval() which should then accept the inputs only and will return the predictions. Targets would not be needed in this case.
I’m unsure if you are using the model still in the training state and are thus passing invalid targets to it. If so, I would stick to the model.eval() approach.

1 Like

So, I have used this code snippet on determining the mAP for object detection:

I believe it is a widespread example commonly used to determine mAP across a dataset of images. I’m not too sure if it’s absolutely correct as I did get strange mAP scores (i.e. 0.5 many times), however, I’m not too familiar with mAP atm.

Nevertheless, I’m unsure if it would distort the metric when I include images that do not contain true positive detections.

For example there is a snippet of code that takes into account and appends true positive detections and false positive detections. From this recall and precision can be determined. However, I believe this is being determined by image/or per batch size rather than globally.

The issue is, if a ground truth box doesn’t exist in an image, then the image doesn’t get appended into the true positives, ground truth detections list. If this happens then I don’t know how false positives are taken into account for determining mAP for that particular image.

# use Counter to create a dictionary where key is image # and value
      # is the # of bboxes in the given image
      amount_bboxes = Counter([gt[0] for gt in ground_truths])

      # goal: keep track of the gt bboxes we have already "detected" with prior predicted bboxes
      # key: image #
      # value: tensor of 0's (size is equal to # of bboxes in the given image)
      for key, value in amount_bboxes.items():
        amount_bboxes[key] = torch.zeros(value)

      # sort over the probabiliity scores of the detections
      detections.sort(key = lambda x: x[2], reverse = True)
      
      true_Positives = torch.zeros(len(detections))
      false_Positives = torch.zeros(len(detections))
      total_gt_bboxes = len(ground_truths)

      # iterate through all detections in given class c
      for detection_index, detection in enumerate(detections):
        # detection[0] indicates image #
        # ground_truth_image: the gt bbox's that are in same image as detection
        ground_truth_image = [bbox for bbox in ground_truths if bbox[0] == detection[0]]

        # num_gt_boxes: number of ground truth boxes in given image
        num_gt_boxes = len(ground_truth_image)
        best_iou = 0
        best_gt_index = 0


        for index, gt in enumerate(ground_truth_image):
          
          iou = torchvision.ops.box_iou(torch.tensor(detection[3:]).unsqueeze(0), 
                                        torch.tensor(gt[3:]).unsqueeze(0))
          
          if iou > best_iou:
            best_iou = iou
            best_gt_index = index

        if best_iou > iou_threshold:
          # check if gt_bbox with best_iou was already covered by previous detection with higher confidence score
          # amount_bboxes[detection[0]][best_gt_index] == 0 if not discovered yet, 1 otherwise
          if amount_bboxes[detection[0]][best_gt_index] == 0:
            true_Positives[detection_index] = 1
            amount_bboxes[detection[0]][best_gt_index] == 1
            true_positives_frame.append(detection)

          else:
            false_Positives[detection_index] = 1
            false_positives_frame.append(detection)
        else:
          false_Positives[detection_index] = 1
          false_positives_frame.append(detection)

This is then used to calculate precision and recall:

 # tensor ex: [1, 0, 0, 1] -> [1, 1, 1, 2]
      true_pos_cumulative_sum = torch.cumsum(true_Positives, dim = 0)
      false_pos_cumulative_sum = torch.cumsum(false_Positives, dim = 0)

      # calculate recall and precision for given class
      recalls = true_pos_cumulative_sum / (total_gt_bboxes + epsilon)
      precisions = torch.divide(true_pos_cumulative_sum, (true_pos_cumulative_sum + false_pos_cumulative_sum + epsilon))
      
      # add 1 to precisions to start graph at (0,1) for integration
      precisions = torch.cat((torch.tensor([1]), precisions))
      recalls = torch.cat((torch.tensor([0]), recalls))

Further details on mAP can be seen on the GitHub link provided above.

(On a further note, I only feed in 1 image at a time during evaluation (testing) as I haven’t figured out a way of applying NMS to a batch yet. It’s ok for now and hopefully not too computational.)

If this implementation of mAP is correct, I’m wondering, how can I use this to determine the f1-score globally rather than per batch/image?

I’m guessing that I should also append true positive detections, false positive detections and total ground truth boxes outside of the dataloader for loop (for every image in the dataset), and then determine recall and precision at the end and eventually use this to determine f1…

Something like this:

true_Positives_global = []
false_Positives_global = []
total_gt_bboxes_global = []

  for images, targets in data_loader:
    # OUTPUT PREDICTIONS, APPLY NMS etc. 

    true_Positives = torch.zeros(len(detections))
    false_Positives = torch.zeros(len(detections))
    total_gt_bboxes = len(ground_truths)

    # iterate through all detections in given class c
      for detection_index, detection in enumerate(detections):
        # detection[0] indicates image #
        # ground_truth_image: the gt bbox's that are in same image as detection
        ground_truth_image = [bbox for bbox in ground_truths if bbox[0] == detection[0]]

        # num_gt_boxes: number of ground truth boxes in given image
        num_gt_boxes = len(ground_truth_image)
        best_iou = 0
        best_gt_index = 0

        for index, gt in enumerate(ground_truth_image):
          
          iou = torchvision.ops.box_iou(torch.tensor(detection[3:]).unsqueeze(0), 
                                        torch.tensor(gt[3:]).unsqueeze(0))
          
          if iou > best_iou:
            best_iou = iou
            best_gt_index = index

        if best_iou > iou_threshold:
          # check if gt_bbox with best_iou was already covered by previous detection with higher confidence score
          # amount_bboxes[detection[0]][best_gt_index] == 0 if not discovered yet, 1 otherwise
          if amount_bboxes[detection[0]][best_gt_index] == 0:
            true_Positives[detection_index] = 1
            amount_bboxes[detection[0]][best_gt_index] == 1

          else:
            false_Positives[detection_index] = 1
        else:
          false_Positives[detection_index] = 1
    
    #APPEND to global frame
    true_Positives_global.append(true_Positives)
    false_Positives_global.append(false_Positives)
    total_gt_bboxes_global.append(total_gt_bboxes)


# DETERMINE PRECISION, RECALL AND F1 SCORE HERE 

I’ve managed to determine losses in validation, however, wondering whether this is still appropriate/applicable for images with no ground truth targets.

I do get losses with nan values as shown for one image without ground truth:

# output losses dict for image without targets
{'loss_classifier': tensor(0.3572, grad_fn=<NllLossBackward0>), 'loss_box_reg': tensor(nan, grad_fn=<DivBackward0>), 'loss_objectness': tensor(18.0578, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>), 'loss_rpn_box_reg': tensor(nan, grad_fn=<DivBackward0>)}

# when losses are summed:
tensor(nan, grad_fn=<AddBackward0>)

I’m assuming there isn’t a work around and that it is best to just train and validate using images with ground truth if I want to determine validation losses?

I don’t know what the loss is supposed to represent if no targets are given, so I think you are right that it makes more sense to calculate the loss if targets are given.

1 Like