I have a model that predicts a binary mask. I want to extract the bounding box coordinates without calling cpu or numpy. Is there any efficient way to do this in pytorch??

Hi @Sabbir_Ahmed,

What format do you want the coordinates to be extracted in?

You can access the raw data pointer with

```
t.data_ptr()
```

then you can cast/interpret it as a ctype with `ctypes`

:

```
import ctypes
c_type_pointer = ctypes.c_void_p(t.data_ptr())
# Then you can process or copy from c_type_pointer
```

```
#I have a predicted mask
pred_mask = model(input)
# I just want an efficient function that maps the mask to coordinates of bounding boxes without calling cpu
coordinates = function(pred_mask)
```

And i am sorry, but i don’t have a strong understanding of raw data pointer.

Could u please suggest a code snippet of the function in pytorch @spanev.

You can reference the code below

```
def extract_bboxes(mask):
"""Compute bounding boxes from masks.
mask: [height, width, num_instances]. Mask pixels are either 1 or 0.
Returns: bbox array [num_instances, (y1, x1, y2, x2)].
"""
boxes = np.zeros([mask.shape[-1], 4], dtype=np.int32)
for i in range(mask.shape[-1]):
m = mask[:, :, i]
# Bounding box.
horizontal_indicies = np.where(np.any(m, axis=0))[0]
print("np.any(m, axis=0)",np.any(m, axis=0))
print("p.where(np.any(m, axis=0))",np.where(np.any(m, axis=0)))
vertical_indicies = np.where(np.any(m, axis=1))[0]
if horizontal_indicies.shape[0]:
x1, x2 = horizontal_indicies[[0, -1]]
y1, y2 = vertical_indicies[[0, -1]]
# x2 and y2 should not be part of the box. Increment by 1.
x2 += 1
y2 += 1
else:
# No mask for this instance. Might happen due to
# resizing or cropping. Set bbox to zeros
x1, x2, y1, y2 = 0, 0, 0, 0
boxes[i] = np.array([y1, x1, y2, x2])
return boxes.astype(np.int32)
```

Hope it will help you~ @Sabbir_Ahmed

2 Likes

@QingEn

Thanks if i want to also capture along with bounding box some area around it … what should i do.

intent is to capture bounding box image portion and vicinity around it

Using skimage, np and pandas (pandas is not necessary:

```
from typing import Dict, List, Optional
import numpy as np
import pandas as pd
from skimage.measure import label, regionprops
import torch as th
def simple_boxing(
classmasks: th.Tensor,
file_ids: Optional[List[int]] = None,
channel2class: Dict[int, int] = None,
) -> pd.DataFrame:
"""
Convert a tensor of shape (B, C, H, W) to a pandas DataFrame with bounding boxes.
Args:
classmasks: Tensor with shape (batch_size, channels, height, width)
"""
assert classmasks.dtype == th.bool, "classmasks data type must be boolean."
assert (
len(classmasks.shape) == 4
), f"classmasks with shape {classmasks.shape} should have 4 dimensions"
# to reduce timing, skip all empty channels
has_detections = (
classmasks.view(classmasks.shape[0:2] + (-1,))
.any(dim=2)
.nonzero()
.cpu()
.numpy()
)
boxes_list = []
classmasks = classmasks.cpu().numpy()
for img_idx, channel_idx in has_detections:
labels = label(classmasks[img_idx, channel_idx], background=0, connectivity=2)
props = regionprops(labels)
file_id = file_ids[img_idx] if file_ids is not None else img_idx
boxes_list.extend([(file_id, *x.bbox, channel_idx) for x in props])
boxes = pd.DataFrame(boxes_list, columns=COLUMN_NAMES_SEGMENTATION)
if channel2class is not None:
boxes["class_id"] = boxes["channel_id"].map(channel2class)
return boxes
```

Just in case someone needs it.