Offline and online data augmentation together

Pracheta_Sahoo · April 14, 2020, 12:42am

Can I apply offline and online data augmentation at the same time? I have a dataset, that has nearly 3500 images highly class imbalanced. To make balanced classes, I applied python’s augmentor package and made it with 12000 images. I applied offline image augmentation here, and chose 1000 samples from each data folder. Now in my learning algorithm I already have online augmentation implemented in the DataLoader class. The code snippet is below:

type or paste code here
self.weak = tfs.Compose([
            tfs.Resize(resize_shape),
            tfs.RandomHorizontalFlip(),
            tfs.RandomCrop(size=crop_shape, padding=int(crop_shape * 0.125), padding_mode='reflect')
            ])
        self.strong = tfs.Compose([
            tfs.Resize(resize_shape),
            tfs.RandomHorizontalFlip(),
            tfs.RandomCrop(size=crop_shape, padding=int(crop_shape * 0.125), padding_mode='reflect'),
            RandAugmentMC(n=2, m=10)
            ])
        self.normalize = tfs.Compose([
            tfs.ToTensor(),
            tfs.Normalize(mean=mean, std=std)])

Now my question is this process is improving the accuracy, but is this correct?

ptrblck · April 14, 2020, 2:34am

The approach might work.
Could you explain the difference between the offline and online data augmentation?
If you’ve applied the same transformations, I’m not sure, if you should expect any benefits from this work flow, and could probably just use the online augmentation?

Pracheta_Sahoo · April 14, 2020, 2:58am

Thanks for your reply. In offline augmentation, I applied rotation and zoom to the images. And in online augmentation I applied randomcrop, flip and resize etc. as shown in the above code snippet. Would it work?

ptrblck · April 14, 2020, 6:22am

Yes, I don’t see a problem in the approach.

Pracheta_Sahoo · April 14, 2020, 6:24am

Thanks. But is there any suitable scientific source to support my approach?

ptrblck · April 14, 2020, 6:31am

I don’t think so, since you could achieve the same result with the (lazy) online transformation and you wouldn’t need to store the offline transformed images.

Pracheta_Sahoo · April 14, 2020, 6:33am

@ptrblck I am doing offline and online image augmentation together, that way the accuracy is higher.

ptrblck · April 14, 2020, 6:34am

I think the accuracy might be higher due to the larger epochs and thus more training iterations.
How many iterations and epochs are you training both model?
Also, are you comparing your offline + online augmentation approach to an online approach with the same transformations?