Training maskrcnn with transforms

I am working through the tutorials and want to apply transformations to get more variety in the training images.
In the tutorials, during training, the data preparation is done in the Dataset.__get_item__ method.

This will lead to a procedure like this:

  • feed dataset to training loop without transformations, run through epochs
  • feed dataset to training loop with a set of transformations, run through epochs
  • repeat previous step till all desired transformations have been applied

Can this lead to a good model, or shouldnt the training use a combined dataset and sample from that randomly?

I’m not sure which tutorial you are referring to, but commonly the transformations are applied on each sample and there won’t be an epoch with “raw” samples (i.e. where the transformations weren’t applied to the samples).

Also, especially during training the transformations are “random”, i.e. there won’t be a point where all transformations were applied.

I was referring to the tutorials at [1] and [2], where the transforms are optional. This gave me the impression that calls with and without transforms are desired.

Does it make sense to run training sessions with different transforms or should all transformations be given to the Dataset at once and let the randomness decide which ones to use in each epoch?

[1] TorchVision Object Detection Finetuning Tutorial — PyTorch Tutorials 1.12.0+cu102 documentation
[2] Datasets & DataLoaders — PyTorch Tutorials 1.12.0+cu102 documentation

I think it depends on your use case, but the common approach would be to pass all (random) transformations to the Dataset, but I would also recommend to check reference implementations and see how these models are trained. If you want to select some transforms randomly you could use torchvision.transforms.RandomChoice.

Ok, I think I made the wrong assumptions. I thought it is good for training to use the original data and transform it in several ways to get some variation. If my dataset has 20.000 samples, applying random transformations like this will end up with a dataset which has still 20.000 images. My assumption was that I will at minimum use the original dataset, and then transform each image multiple times with different transformations, to increase variety and will end up with a training dataset of 100.000 images.
Therefore one of my thoughts was if I should build up a 100.000 images dataset which is used with shuffled sampling or if I use 20.000 images multiple times with different transformations.
But then i am thinking if it is good to train with those transformations one after each other, and maybe introduce some bias into the model.