ValueError: All bounding boxes should have positive height and width. Found invalid box

Hi, I’m working with Mask RCNN neural network and with PyTorch I need to load dataset, and I followed this tutorial: TorchVision Object Detection Finetuning Tutorial — PyTorch Tutorials 2.2.0+cu121 documentation I have coordinates of polygon bounding box of every word on image wrote in json file. So now, I don’t know how to compute rectangle bounding box of that polygon coordinates.
I’m using train_labels.json from ArT (ICDAR2019) dataset

f = open('train_labels.json')
data = json.load(f)
boxes = []
        for i in data.keys():
            for j in data[i]:
                mask = j['points'] # --> polygonBB coordinates
                obj_ids = np.unique(mask)
                obj_ids = obj_ids[1:]

                num_objs = len(obj_ids)

Variable mask has polygon bb coordinates.

When I use:

left, right = min(mask, key=lambda p: p[0]), max(mask, key=lambda p: p[0])
 top, bottom = min(mask, key=lambda p: p[1]), max(mask, key=lambda p: p[1])    

 boxes.append([left[0],top[1],right[0],bottom[1]])

then the output is
ValueError: All bounding boxes should have positive height and width. Found invalid box [24.994787216186523, 472.875732421875, 46.65693664550781, 472.875732421875] for target at index 0.

For some reason, the second and the last coordinate is the same but why?

But if I use the code from PyTorch tutorial:

for z in range(num_objs):
                    pos = np.where(masks[z])
                    xmin = np.min(pos[1])
                    xmax = np.max(pos[1])
                    ymin = np.min(pos[0])
                    ymax = np.max(pos[0])
                    #print(xmin, xmax,ymin,ymax)
                    boxes.append([xmin, ymin, xmax, ymax])

boxes = torch.as_tensor(boxes, dtype=torch.float32)
target = {}
        target["boxes"] = boxes

Output is:
ValueError: All bounding boxes should have positive height and width. Found invalid box [0.0, 0.0, 0.0, 0.0] for target at index 0.

My annotation for word “red” is:

 {
                "transcription": "red",
                "language": "Latin",
                "illegibility": false,
                "points": [
                    [
                        21.51514273595292,
                        96.10111281287746
                    ],
                    [
                        34.41549485283264,
                        97.283756254127
                    ],
                    [
                        36.42017833821299,
                        98.51023290725634
                    ],
                    [
                        48.266909960703956,
                        99.59706074137782
                    ],
                    [
                        51.456108522323206,
                        92.58403315228024
                    ],
                    [
                        63.3249196091766,
                        93.6674491203431
                    ],
                    [
                        60.960257769931985,
                        110.16062344395255
                    ],
                    [
                        49.094275907517556,
                        109.06503983039966
                    ],
                    [
                        46.79115225196064,
                        109.89584392001012
                    ],
                    [
                        34.94618448639528,
                        108.80142524223687
                    ],
                    [
                        32.794444829218214,
                        108.60261473324032
                    ],
                    [
                        19.896204045991865,
                        107.41087863881067
                    ],
                    [
                        21.51514273595292,
                        96.10111281287746
                    ]
                ]
            }

This is just one word of I don’t know how many, I have 80k images with random generated words. I’m also sending a image with some other text but all text instances are equal annotated.

Also I have and this code and the output is:

if len(mask) == 0:
                raise ValueError("Can't compute bounding box of empty list")
                minx, miny = float("inf"), float("inf")
                maxx, maxy = float("-inf"), float("-inf")
                for x, y in mask:
                    # Set min coords
                    if x <minx:
                        minx = x
                    if y <miny:
                        miny = y
                    # Set max coords
                    if x >maxx:
                        maxx = x
                    elif y > maxy:
                        maxy = y
               boxes.append([minx,miny,maxx,maxy])

ValueError: All bounding boxes should have positive height and width. Found invalid box [594.2897338867188, 533.2000122070312, 366.35723876953125, 468.6328125] for target at index 0.