I have a model that predicts a binary mask. I want to extract the bounding box coordinates without calling cpu or numpy. Is there any efficient way to do this in pytorch??

Hi @Sabbir_Ahmed,

What format do you want the coordinates to be extracted in?

You can access the raw data pointer with

```
t.data_ptr()
```

then you can cast/interpret it as a ctype with `ctypes`

:

```
import ctypes
c_type_pointer = ctypes.c_void_p(t.data_ptr())
# Then you can process or copy from c_type_pointer
```

```
#I have a predicted mask
pred_mask = model(input)
# I just want an efficient function that maps the mask to coordinates of bounding boxes without calling cpu
coordinates = function(pred_mask)
```

And i am sorry, but i don’t have a strong understanding of raw data pointer.

Could u please suggest a code snippet of the function in pytorch @spanev.

You can reference the code below

```
def extract_bboxes(mask):
"""Compute bounding boxes from masks.
mask: [height, width, num_instances]. Mask pixels are either 1 or 0.
Returns: bbox array [num_instances, (y1, x1, y2, x2)].
"""
boxes = np.zeros([mask.shape[-1], 4], dtype=np.int32)
for i in range(mask.shape[-1]):
m = mask[:, :, i]
# Bounding box.
horizontal_indicies = np.where(np.any(m, axis=0))[0]
print("np.any(m, axis=0)",np.any(m, axis=0))
print("p.where(np.any(m, axis=0))",np.where(np.any(m, axis=0)))
vertical_indicies = np.where(np.any(m, axis=1))[0]
if horizontal_indicies.shape[0]:
x1, x2 = horizontal_indicies[[0, -1]]
y1, y2 = vertical_indicies[[0, -1]]
# x2 and y2 should not be part of the box. Increment by 1.
x2 += 1
y2 += 1
else:
# No mask for this instance. Might happen due to
# resizing or cropping. Set bbox to zeros
x1, x2, y1, y2 = 0, 0, 0, 0
boxes[i] = np.array([y1, x1, y2, x2])
return boxes.astype(np.int32)
```

Hope it will help you~ @Sabbir_Ahmed

@QingEn

Thanks if i want to also capture along with bounding box some area around it … what should i do.

intent is to capture bounding box image portion and vicinity around it

Using skimage, np and pandas (pandas is not necessary:

```
from typing import Dict, List, Optional
import numpy as np
import pandas as pd
from skimage.measure import label, regionprops
import torch as th
def simple_boxing(
classmasks: th.Tensor,
file_ids: Optional[List[int]] = None,
channel2class: Dict[int, int] = None,
) -> pd.DataFrame:
"""
Convert a tensor of shape (B, C, H, W) to a pandas DataFrame with bounding boxes.
Args:
classmasks: Tensor with shape (batch_size, channels, height, width)
"""
assert classmasks.dtype == th.bool, "classmasks data type must be boolean."
assert (
len(classmasks.shape) == 4
), f"classmasks with shape {classmasks.shape} should have 4 dimensions"
# to reduce timing, skip all empty channels
has_detections = (
classmasks.view(classmasks.shape[0:2] + (-1,))
.any(dim=2)
.nonzero()
.cpu()
.numpy()
)
boxes_list = []
classmasks = classmasks.cpu().numpy()
for img_idx, channel_idx in has_detections:
labels = label(classmasks[img_idx, channel_idx], background=0, connectivity=2)
props = regionprops(labels)
file_id = file_ids[img_idx] if file_ids is not None else img_idx
boxes_list.extend([(file_id, *x.bbox, channel_idx) for x in props])
boxes = pd.DataFrame(boxes_list, columns=COLUMN_NAMES_SEGMENTATION)
if channel2class is not None:
boxes["class_id"] = boxes["channel_id"].map(channel2class)
return boxes
```

Just in case someone needs it.

The above answers address how to do it when you just want one box per mask. If you stumbled upon this question looking for how to predict multiple boxes per mask (e.g. when regions are disconnected) see: