How to resize coordinates of bounding box in torchvision.datasets.CoCoDetection()

ELA-Bang · August 25, 2018, 4:15am

I load the CoCo dataset with torchvision.datasets.CoCoDetection()
the example code on official doc:
cap = dset.CocoDetections(root = ‘dir where images are’,
annFile = ‘json annotation file’, transform=None,target_transform=None)

The images in this dataset has various size. So how can I resize its images to (416,416) and rescale coordinates of bounding boxes?
e.g. original size=(640,480), bounding box=[ x, y, w, h ]
processed size=(416,416), bounding box=[ x* 416/640, y* 416/480, w* 416/640, h* 416/480]

I know use the argument: transform = transforms.Resize([416,416]) can resize the images, but how can I modify those bounding box coordinates efficiently?

JuanFMontesinos · August 25, 2018, 12:01pm

Hmmm i don’t really know which files are u using but if u have a tensor or a np vector ur scaling is just new=scalar*old. That operation is optimized in both, np and pytorch.

If u have a python list you can conver it to a tensor using torch.stack (i guess) or np.asarray()

ELA-Bang · August 26, 2018, 2:21am

I mean, I loaded a dataset, this dataset has length 40000+, and dataset [ i ] [ 0 ] is a PIL image, their sizes vary, like (480, 640), (537, 450). dataset [ i ] [ 1 ] is a tuple containing tuple containing some dictionaries. Every dictionary represent an annotation for an object in this picture, denote this dictionary as dict. dict [ ‘bbox’ ] is a list [ x, y, width, height ] which represents the bounding box, and dict [ ‘category_id’ ] is an int representing the object’s category.

What I want is to get some images with size (416, 416), this size fits my network. And the [ ‘bbox’ ] value should be processed simultaneously as I changed the size of picture.

The solution I take now is to process these images and annotations during training time. Well, I find it slow to load data ( a tensor of batch size 32 : (32, 3, 416, 416). Loading this tensor with its annotation takes about 3 seconds )

Previously I tried to changes the value of bbox coordinates before training, It would takes about 4 hours to process 40000+ images’ annotations. I didn’t stand to wait for so long time…

marina_neseem · March 30, 2021, 1:11am

Did you find an efficient way to do this?

paschalis_m · April 1, 2021, 10:19am

A very convenient solution to this would be probably to normalize the boxes in a similar way as to YOLO format. In such case, (x,y,w,h) should be processed so as to be relative to width and height of image. Speaking in code :

# image is not resized yet
normalized_x = absolute_x / image.shape[0] 
normalized_y = absolute_y / image.shape[1]
normalized_width = absolute_width / image.shape[0]
normalized_height = absolute_height / image.shape[1]

Now, as long as you resize the image according to the initial aspect ratio, you will always have access to the proportional coordinates.

Not to mention that you should take care of what x and y represents in you annotation formats, as sometimes they tend to infer to the center whether the top_left corner of the bounding box.

You can also check: