I am going to fine-tune DETR on my dataset and I need to add some additional augmentation using albumentations. The question is, how should I prepare the targets format in __getitem__
? Should it be in yolo, coco or pascal_voc format?
The original dataset uses coco format like [xmin, ymin, w, h]
but I saw in dataset format that it converts to normalized [xmin,ymin,xmax,ymax]
like:
boxes[:, 2:] += boxes[:, :2]
But is post processing, I see a function that converts [cx,cy,w,h]
to [xmin,ymin,xmax,ymax]
?
def box_cxcywh_to_xyxy():
...
So, in which format should I prepare the target bboxes so that they compare to predicted bboxes?