I’m trying to find a vectorized implementation for the following code - I have 3 tensors that represent:
bboxes - [#bboxes, 4] which is a set of bounding boxes ranges, for example:
So bbox #1 represents rows 1-3 and columns 6-9, bbox #2 rows 4-7 columns 6-13, etc.
#bboxes > 50K.
values - [#bboxes, 1] - set of associated values to each bounding box, for example:
[0.2, 0.5, 0.1, …]
bbox#1 → 0.2
bbox#2 → 0.5
image - [1, 512, 512].
I need to “fill” the image tensor with the corresponding values of each bbox. Each pixel in the image should contain the highest value possible, i.e., if bbox#1 and bbox#2 have a matching pixel, that pixel should contain 0.5 and not 0.2 .
So, a simple example that only contains 1 bbox would be:
image[:, 1:3, 6:9] = values
I’m trying to find a vectorized implementation that doesn’t contain a for loop over #bboxes since it is really slow due to the #bboxes size.
Another issue is that it is impossible to create mask tensors for each bbox since it requires a [512, 512, #bboxes] tensor, which is impossible due to memory limitation.