How can I apply non-maximum suppression (NMS) to a batch of images?

I have the following function defined for non-maximum suppression (NMS) post processing on my predictions.

At the moment, it is defined for a single prediction or output:

from torchvision import transforms as torchtrans  

def apply_nms(orig_prediction, iou_thresh=0.3):
    
    # torchvision returns the indices of the bboxes to keep
    keep = torchvision.ops.nms(orig_prediction['boxes'], orig_prediction['scores'], iou_thresh)
    
    final_prediction = orig_prediction
    final_prediction['boxes'] = final_prediction['boxes'][keep]
    final_prediction['scores'] = final_prediction['scores'][keep]
    final_prediction['labels'] = final_prediction['labels'][keep]
    
    return final_prediction

where I then apply it to a single image:

cpu_device = torch.device("cpu")

# pick one image from the test set
img, target = valid_dataset[3]
# put the model in evaluation mode
model.to(cpu_device)
model.eval()
with torch.no_grad():
    output = model([img])[0]
 

nms_prediction = apply_nms(output, iou_thresh=0.1)

However, I’m not sure how I can do this efficiently for a whole batch of images from a dataloader:

cpu_device = torch.device("cpu")
model.eval()
with torch.no_grad():
  for images, targets in valid_data_loader:
    images = list(img.to(device) for img in images)
    
    outputs = model(images)
    outputs = [{k: v.to(cpu_device)for k, v in t.items()} for t in outputs]
    #DO NMS POST PROCESSING HERE??

What would be the best approach? How can I apply the above defined function for multiple images? Would this be best done in another for loop?

Did you consider modifying apply_nms function so that it supports vectorized input?
Like:

for prediction in orig_predictions:
   keep = torchvision.ops.nms(prediction['boxes'], ...

Not saying this is the best approach…

1 Like

I haven’t tried this yet but I’m assuming I would need to save the outputs for each image in a list?

Well, this code from your validation loop:

  for images, targets in valid_data_loader:
    images = list(img.to(device) for img in images)
    
    outputs = model(images)
    outputs = [{k: v.to(cpu_device)for k, v in t.items()} for t in outputs]

is already processing vectorized data, i.e. many data points loaded by valid_data_loader into images and targets variables (depending on the batch_size for this loader). So doing NMS post-processing with vectorized version of apply_nms function should not be a problem. You have to try it to see what happens.

Side note: this would be not a very elegant solution, but you may set batch_size for valid_data_loader to 1 and use unchanged apply_nms function to see if it works at all for a single data in each validation loop and only then, maybe, move to a vectorized version. Like, you know: one small step at a time. :wink:

1 Like

Thanks for the suggestions!

I did try it with a batch size of 1 and it does work which is useful, I’m not sure how efficient this is but it’s better than not applying NMS!

# batch size of 1 in dataloader

cpu_device = torch.device("cpu")
model.eval()
with torch.no_grad():
  for images, targets in valid_data_loader:
    images = list(img.to(device) for img in images)
    outputs = model(images)
    outputs = [{k: v.to(cpu_device) for k, v in t.items()} for t in outputs]
    predictions = apply_nms(outputs[0], iou_thresh=0.3)

My end goal is to then find the f1-score. I don’t know if this warrants another question on the forum, but I’m not really sure how I can extract true positives, false positives etc. from the outputs…