What types of data augmentation in Detection Task can be used?

I have got the MSCOCO Detection train data to train a network. Since my deep structure has 76 layers composed of Convolution Layer, MaxPool Layer and BatchNorm with LeakyRelu, I think that the number of train data is not ample. So I would like to know how can I augment more data to my original training data which has ~80k images?

As a reminder, In classification task, I can easily do translation, rotation, scaling, shearing and affine transformations. Since our labels are just a String or integer number which is not related to specific position unless detection labels. But in Detection, we should pay attention more. Because we have Bounding Boxes composed of 4 numbers (x1,y1,x2,y2). So with this consideration in our mind, how can i do a suitable augmentation?

Thanks!

1 Like

I have the same question…

I’d define your own Dataset. The default torchvision options don’t allow an easy way to synch up random augmentation to both input and target.

I was building a model that had an image as a target, that was either an image loaded from disk, or generated on the fly from a list of points. I wrote a dataset that used random transforms to choose tiles from both the (very large) input image and either target image or target points using OpenCV. It was a rather quick experiment and I didn’t fully validate all details, but it seemed to be working… never got a chance to clean it up.

See _crop_and_transform function on line 413 here, target_arr can either be an image or array of points. The same affine transform matrix can be applied in both cases… you could do something similar if you put your bounding box points into an array of x, y points. Keep in mind, if you’re shearing or rotating, since the points represent a bounding box, you’ll need to use the transformed points to pick an enclosing, axis aligned box, and not use them directly.

Maybe this would help https://github.com/lozuwa/impy

You can define very complex data augmentation pipelines just with a few lines of code.