What types of data augmentation in Detection Task can be used?

mderakhshani · April 17, 2017, 11:14am

I have got the MSCOCO Detection train data to train a network. Since my deep structure has 76 layers composed of Convolution Layer, MaxPool Layer and BatchNorm with LeakyRelu, I think that the number of train data is not ample. So I would like to know how can I augment more data to my original training data which has ~80k images?

As a reminder, In classification task, I can easily do translation, rotation, scaling, shearing and affine transformations. Since our labels are just a String or integer number which is not related to specific position unless detection labels. But in Detection, we should pay attention more. Because we have Bounding Boxes composed of 4 numbers (x1,y1,x2,y2). So with this consideration in our mind, how can i do a suitable augmentation?

Thanks!

squirrel · August 6, 2017, 12:26pm

I have the same question…

rwightman · August 6, 2017, 7:06pm

I’d define your own Dataset. The default torchvision options don’t allow an easy way to synch up random augmentation to both input and target.

I was building a model that had an image as a target, that was either an image loaded from disk, or generated on the fly from a list of points. I wrote a dataset that used random transforms to choose tiles from both the (very large) input image and either target image or target points using OpenCV. It was a rather quick experiment and I didn’t fully validate all details, but it seemed to be working… never got a chance to clean it up.

See _crop_and_transform function on line 413 here, target_arr can either be an image or array of points. The same affine transform matrix can be applied in both cases… you could do something similar if you put your bounding box points into an array of x, y points. Keep in mind, if you’re shearing or rotating, since the points represent a bounding box, you’ll need to use the transformed points to pick an enclosing, axis aligned box, and not use them directly.

github.com

rwightman/pytorch-countception-sealion/blob/master/dataset.py#L413


            y_max = h
        x_min += pw
        x_max -= pw
        y_min += ph
        y_max -= ph
        assert x_max - x_min > 0 and y_max - y_min > 0
        cx = random.randint(x_min, x_max)
        cy = random.randint(y_min, y_max)
    return cx, cy


def _crop_and_transform(self, cx, cy, input_img, target_arr, randomize=False):
    target_tile = None
    transform_target = False if target_arr is None else True
    target_is_coords = True if transform_target and target_arr.shape[1] == 3 else False


    if randomize:
        angle = 0.
        hflip = random.random() < 0.5
        vflip = random.random() < 0.5
        do_rotate = random.random() < 0.25 if not hflip and not vflip else False
        if do_rotate:

Rodrigo_Loza · May 11, 2018, 7:42pm

Maybe this would help https://github.com/lozuwa/impy

You can define very complex data augmentation pipelines just with a few lines of code.