Convert Pandas DataFrame to List[Tensor[L, 4]] or Tensor[K, 5]

MikeTensor · February 10, 2023, 2:02pm

Hello!

I try to convert my Pandas DataFrame (BoundingBoxes) to a List of Tensors, or one single Tensor
After conversion it should look like:

(Tensor[K, 5] or List[Tensor[L, 4]]).

As described at roi_align

bboxes_tensor = torch.tensor([df.bbox], dtype=torch.float) doesn’t work with roi_align.

Any idea how to get the conversion done?

Example of how the DataFrame Bboxes of a random image could look like.
The amount of bboxes is varrying.

print(df.bbox)

0 [1056.16, 190.5, 1241.0, 374.0]
1 [359.43, 179.3, 516.3, 270.97]
2 [444.13, 163.92, 559.2, 247.04]
3 [870.59, 115.22, 1101.06, 213.85]
4 [724.49, 171.09, 790.5, 215.13]
5 [534.25, 166.2, 596.08, 216.76]
6 [709.81, 162.43, 765.7, 205.68]
7 [562.02, 166.42, 605.57, 206.89]
8 [600.19, 163.56, 633.57, 188.6]
Name: bbox, dtype: object

ptrblck · February 14, 2023, 6:00am

I would guess tensor = torch.from_numpy(df.bbox.to_numpy()) might work assuming your pd.DataFrame can be expressed as a numpy array.

MikeTensor · February 14, 2023, 12:28pm

Thanks for your reply @ptrblck

Unfortunately it doesn’t work:
TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

But if somebody is facing the same issue; I found a (messy) workaround:

df['batch_index'] = 0
bboxes_batch = torch.tensor(df.batch_index, dtype=torch.float) # 1 is BBoxes
bboxes_tensor = torch.tensor(df.bbox, dtype=torch.float) # 1 is BBoxes
rois = torch.cat((bboxes_batch.unsqueeze(1), bboxes_tensor), 1)

This works for me and roi_align

tensor([[   0.0000, 1056.1600,  190.5000, 1241.0000,  374.0000],
        [   0.0000,  359.4300,  179.3000,  516.3000,  270.9700],
        [   0.0000,  444.1300,  163.9200,  559.2000,  247.0400],
        [   0.0000,  870.5900,  115.2200, 1101.0601,  213.8500],
        [   0.0000,  724.4900,  171.0900,  790.5000,  215.1300],
        [   0.0000,  534.2500,  166.2000,  596.0800,  216.7600],
        [   0.0000,  709.8100,  162.4300,  765.7000,  205.6800],
        [   0.0000,  562.0200,  166.4200,  605.5700,  206.8900],
        [   0.0000,  600.1900,  163.5600,  633.5700,  188.6000]])

I add a coloumn of Zeros to the pd.Dataframe
Make a tensor out of the pd.bboxes, a tensor out of the pseudo pd.batch_index and glue (cat) both tensors together.

Pretty sure this is not super elegant, but as for now it works.
If somebody knows a smoother way feel free to commet.

Only works with roi_align if batch-size is one. Otherwise the first index has to be the batch_index (0, 1, …)