I load the CoCo dataset with torchvision.datasets.CoCoDetection()
the example code on official doc:
cap = dset.CocoDetections(root = ‘dir where images are’,
annFile = ‘json annotation file’, transform=None,target_transform=None)
The images in this dataset has various size. So how can I resize its images to (416,416) and rescale coordinates of bounding boxes?
e.g. original size=(640,480), bounding box=[ x, y, w, h ]
processed size=(416,416), bounding box=[ x* 416/640, y* 416/480, w* 416/640, h* 416/480]
I know use the argument: transform = transforms.Resize([416,416]) can resize the images, but how can I modify those bounding box coordinates efficiently?
I mean, I loaded a dataset, this dataset has length 40000+, and dataset [ i ] [ 0 ] is a PIL image, their sizes vary, like (480, 640), (537, 450). dataset [ i ] [ 1 ] is a tuple containing tuple containing some dictionaries. Every dictionary represent an annotation for an object in this picture, denote this dictionary as dict. dict [ ‘bbox’ ] is a list [ x, y, width, height ] which represents the bounding box, and dict [ ‘category_id’ ] is an int representing the object’s category.
What I want is to get some images with size (416, 416), this size fits my network. And the [ ‘bbox’ ] value should be processed simultaneously as I changed the size of picture.
The solution I take now is to process these images and annotations during training time. Well, I find it slow to load data ( a tensor of batch size 32 : (32, 3, 416, 416). Loading this tensor with its annotation takes about 3 seconds )
Previously I tried to changes the value of bbox coordinates before training, It would takes about 4 hours to process 40000+ images’ annotations. I didn’t stand to wait for so long time…
A very convenient solution to this would be probably to normalize the boxes in a similar way as to YOLO format. In such case, (x,y,w,h) should be processed so as to be relative to width and height of image. Speaking in code :