I’m confused by how bounding boxes are defined.
in detection bbox, a coordinate is formatted as ((xmin, ymin), (xmax,ymax)):
(xmin, ymin) — indicates top-left corner of bbox.
(xmax,ymax) — indicates bottom-right corner of bbox.
But from plane of a sample box:

Coordinate of the box is ((2, 3), (4,1)), from this the format should be:
((xmin, ymax), (xmax,ymin))
What is the logic in the detection bbox format of ((xmin, ymin), (xmax,ymax))?