Extracting Bounding Box Coordinates from mask

I have a model that predicts a binary mask. I want to extract the bounding box coordinates without calling cpu or numpy. Is there any efficient way to do this in pytorch??

Hi @Sabbir_Ahmed,

What format do you want the coordinates to be extracted in?

You can access the raw data pointer with


then you can cast/interpret it as a ctype with ctypes:

import ctypes
c_type_pointer = ctypes.c_void_p(t.data_ptr())
# Then you can process or copy from c_type_pointer
#I have a predicted mask
pred_mask = model(input)

# I just want an efficient function that maps the mask to coordinates of bounding boxes without calling cpu 
coordinates = function(pred_mask)

And i am sorry, but i don’t have a strong understanding of raw data pointer.
Could u please suggest a code snippet of the function in pytorch @spanev.

You can reference the code below

def extract_bboxes(mask):

    """Compute bounding boxes from masks.

    mask: [height, width, num_instances]. Mask pixels are either 1 or 0.


    Returns: bbox array [num_instances, (y1, x1, y2, x2)].


    boxes = np.zeros([mask.shape[-1], 4], dtype=np.int32)

    for i in range(mask.shape[-1]):

        m = mask[:, :, i]

        # Bounding box.

        horizontal_indicies = np.where(np.any(m, axis=0))[0]

        print("np.any(m, axis=0)",np.any(m, axis=0))

        print("p.where(np.any(m, axis=0))",np.where(np.any(m, axis=0)))

        vertical_indicies = np.where(np.any(m, axis=1))[0]

        if horizontal_indicies.shape[0]:

            x1, x2 = horizontal_indicies[[0, -1]]

            y1, y2 = vertical_indicies[[0, -1]]

            # x2 and y2 should not be part of the box. Increment by 1.

            x2 += 1

            y2 += 1


            # No mask for this instance. Might happen due to

            # resizing or cropping. Set bbox to zeros

            x1, x2, y1, y2 = 0, 0, 0, 0

        boxes[i] = np.array([y1, x1, y2, x2])

    return boxes.astype(np.int32)

Hope it will help you~ @Sabbir_Ahmed


Thanks if i want to also capture along with bounding box some area around it … what should i do.
intent is to capture bounding box image portion and vicinity around it

Using skimage, np and pandas (pandas is not necessary:

from typing import Dict, List, Optional

import numpy as np
import pandas as pd
from skimage.measure import label, regionprops
import torch as th

def simple_boxing(
    classmasks: th.Tensor,
    file_ids: Optional[List[int]] = None,
    channel2class: Dict[int, int] = None,
) -> pd.DataFrame:
    Convert a tensor of shape (B, C, H, W) to a pandas DataFrame with bounding boxes.

        classmasks: Tensor with shape (batch_size, channels, height, width)
    assert classmasks.dtype == th.bool, "classmasks data type must be boolean."
    assert (
        len(classmasks.shape) == 4
    ), f"classmasks with shape {classmasks.shape} should have 4 dimensions"

    # to reduce timing, skip all empty channels
    has_detections = (
        classmasks.view(classmasks.shape[0:2] + (-1,))

    boxes_list = []
    classmasks = classmasks.cpu().numpy()

    for img_idx, channel_idx in has_detections:
        labels = label(classmasks[img_idx, channel_idx], background=0, connectivity=2)
        props = regionprops(labels)
        file_id = file_ids[img_idx] if file_ids is not None else img_idx
        boxes_list.extend([(file_id, *x.bbox, channel_idx) for x in props])

    boxes = pd.DataFrame(boxes_list, columns=COLUMN_NAMES_SEGMENTATION)

    if channel2class is not None:
        boxes["class_id"] = boxes["channel_id"].map(channel2class)

    return boxes

Just in case someone needs it.

The above answers address how to do it when you just want one box per mask. If you stumbled upon this question looking for how to predict multiple boxes per mask (e.g. when regions are disconnected) see: