Opinion about Batch Preprocessing vs Real Preprocessing for images

Hello,
I’d like your opinion on some approaches for applying preprocessing to images for deep learning (eg: semantic segmentation).

Important note :

  • Take for account that this would be performed in a pipeline therefore every new training would re-apply this (consider the disk to be emptied between training for instance). Also take for granted that we have access to cpus and gpus.

Approach 1 : Batch Transforms, Real time data augmentation

This approach would essentially apply the transforms like resize, rescale (/255) and toTensor in a batch and save the transformed images on disk for training.
Then during training, only the data augmentation would be left to be applied (for the training set only).

TLDR: Transforms (images) => onDisk => Training + Data aug on the fly

Approach 2 : Batch Transforms, Batch data augmentation

This approach would apply the transforms like resize, rescale (/255),toTensor and the data augmentation for the training set then save it on disk for training.
Then during training, we’d use a dataset with transforms=None meaning no Transforms are applied, it would simply read the already processed images.

TLDR : Transforms(images) => Data Aug(images) => onDisk => Training

Approach 3 : Batch Data augmentation, Real time Training + Transforms

Same principle but for data aug (for training set)

Data augmentation (images) => onDisk => Training + Transforms on the fly

Approach 4 : Real time transforms, real time data augmentation

Here nothing is saved to disk besides the raw train/valid/test images.

Training + Transforms + Data augmentation on the fly.

Important things to keep in mind :

  • Keep in mind that we could decide to perform preprocessing on the cpu or gpu if we use any batch approach, this might or might not be advantageous.
  • We also might want to consider if having them on disk is useful to save it for reproducibility and lineage (in case we want to go back from a model in production for instance and see what it was trained on).
  • We also must keep in mind that if we perform the data aug on the fly then it is re-applied after each epoch therefore this might or might not improve model accuracy.

So what do you guys think ?

  1. This sounds like a valid approach, as the data augmentation is applied on each sample during training and could potentially save some preprocessing time. In any case I would profile the image decoding (e.g. JPEG decoding) vs. loading the raw binary data (speed and file size). E.g. loading a 1200x1600 JPEG encoded file with a size of ~310kB results in a tensor of ~23MB, since raw pixels are stored if you are not resizing it.

  2. This approach wouldn’t use any data augmentation and I would then claim it’s invalid. Even if you are applying it once, this could be seen as creating a “new” dataset (with flipped, cropped etc. images) which won’t be randomly transformed during training.

  3. Same as 1. with the difference to store the augmented images on disk, which I believe shouldn’t yield any advantage.

  4. Standard approach, so valid.

1 Like