Hello, I am trying to perform the following transformation, ideally using existing Caffe2 operators:
Inputs:
boxes.shape=[nb_boxes, 6]
where each row boxes[i, :] -> [x1, y1, x2, y2, float confidence, int category_index].
notably, boxes is arranged sequentially by image
batch_splits.shape=[nb_images]
Denotes nb of boxes for each image in the batch.
Example: if batch_splits = [42, 3], then the first 42 boxes in `boxes` belong to the first image, and the next 3 boxes belong to the second image.
Output:
boxes_by_batch.shape=[nb_images, nb_boxes_max, 6]
A rearrangement of boxes, but in a familiar batch-style format, eg boxes_by_batch[0, :, :] contains boxes for image 0, boxes_by_batch[1, :, :] contains boxes for image 1, etc.
This is zero-filled: important, because each image may have different nb of detected boxes. The second dimension nb_boxes_max is the largest nb of boxes in an image in the batch.
Here is a specific example + code that better illustrates what I’m trying to do:
import numpy as np
from caffe2.python import workspace
# [1/3] Initialize input data
boxes = np.array([
[10, 20, 100, 200, 0.75, 1],
[20, 30, 50, 70, 0.95, 3],
[80, 100, 25, 25, 0.5, 2]
], dtype=np.float32)
# batch_splits indicates that first two boxes belong to img0, and third box
# belongs to img1
batch_splits = np.array([2, 1], dtype=np.float32)
workspace.FeedBlob("boxes", boxes)
workspace.FeedBlob("batch_splits", batch_splits)
# [2/3] Add op(s) that emit boxes_by_batch
ops = [] # TODO: fill me!
workspace.RunOperatorsOnce(ops)
# [3/3] Fetch boxes_by_batch, compare to desired output
boxes_by_batch = workspace.FetchBlob("boxes_by_batch")
boxes_by_batch_desired = np.array([
[
[10, 20, 100, 200, 0.75, 1],
[20, 30, 50, 70, 0.95, 3]
],
[
[80, 100, 25, 25, 0.5, 2],
[0, 0, 0, 0, 0, 0]
]
])
print("same? {}".format(np.allclose(boxes_by_batch, boxes_by_batch_desired)))
Context: I’m trying to serve a Detectron object detection model (FRCNN+FRN) with batch inference enabled. The Detectron model emits detected boxes in the boxes
, batch_splits
format, but for my usecase it’d be easier if the network instead emitted the detected boxes in boxes_by_batch
format.
Ideally, I’d like to use existing Caffe2 operators so that I don’t have to write a custom C++ operator (and deal with linking it to our prod env, etc).
Thank you!