Hi, I’m working with Mask RCNN neural network and with PyTorch I need to load dataset, and I followed this tutorial: TorchVision Object Detection Finetuning Tutorial — PyTorch Tutorials 2.2.0+cu121 documentation I have coordinates of polygon bounding box of every word on image wrote in json file. So now, I don’t know how to compute rectangle bounding box of that polygon coordinates.
I’m using train_labels.json
from ArT (ICDAR2019) dataset
f = open('train_labels.json')
data = json.load(f)
boxes = []
for i in data.keys():
for j in data[i]:
mask = j['points'] # --> polygonBB coordinates
obj_ids = np.unique(mask)
obj_ids = obj_ids[1:]
num_objs = len(obj_ids)
Variable mask
has polygon bb coordinates.
When I use:
left, right = min(mask, key=lambda p: p[0]), max(mask, key=lambda p: p[0])
top, bottom = min(mask, key=lambda p: p[1]), max(mask, key=lambda p: p[1])
boxes.append([left[0],top[1],right[0],bottom[1]])
then the output is
ValueError: All bounding boxes should have positive height and width. Found invalid box [24.994787216186523, 472.875732421875, 46.65693664550781, 472.875732421875] for target at index 0.
For some reason, the second and the last coordinate is the same but why?
But if I use the code from PyTorch tutorial:
for z in range(num_objs):
pos = np.where(masks[z])
xmin = np.min(pos[1])
xmax = np.max(pos[1])
ymin = np.min(pos[0])
ymax = np.max(pos[0])
#print(xmin, xmax,ymin,ymax)
boxes.append([xmin, ymin, xmax, ymax])
boxes = torch.as_tensor(boxes, dtype=torch.float32)
target = {}
target["boxes"] = boxes
Output is:
ValueError: All bounding boxes should have positive height and width. Found invalid box [0.0, 0.0, 0.0, 0.0] for target at index 0.
My annotation for word “red” is:
{
"transcription": "red",
"language": "Latin",
"illegibility": false,
"points": [
[
21.51514273595292,
96.10111281287746
],
[
34.41549485283264,
97.283756254127
],
[
36.42017833821299,
98.51023290725634
],
[
48.266909960703956,
99.59706074137782
],
[
51.456108522323206,
92.58403315228024
],
[
63.3249196091766,
93.6674491203431
],
[
60.960257769931985,
110.16062344395255
],
[
49.094275907517556,
109.06503983039966
],
[
46.79115225196064,
109.89584392001012
],
[
34.94618448639528,
108.80142524223687
],
[
32.794444829218214,
108.60261473324032
],
[
19.896204045991865,
107.41087863881067
],
[
21.51514273595292,
96.10111281287746
]
]
}
This is just one word of I don’t know how many, I have 80k images with random generated words. I’m also sending a image with some other text but all text instances are equal annotated.
Also I have and this code and the output is:
if len(mask) == 0:
raise ValueError("Can't compute bounding box of empty list")
minx, miny = float("inf"), float("inf")
maxx, maxy = float("-inf"), float("-inf")
for x, y in mask:
# Set min coords
if x <minx:
minx = x
if y <miny:
miny = y
# Set max coords
if x >maxx:
maxx = x
elif y > maxy:
maxy = y
boxes.append([minx,miny,maxx,maxy])
ValueError: All bounding boxes should have positive height and width. Found invalid box [594.2897338867188, 533.2000122070312, 366.35723876953125, 468.6328125] for target at index 0.