Purpose of image augmentation

I can think of 2 possible reasons for doing image augmentation. I’m not sure if either one is more correct or both are correct.

  1. Produce variations on the training dataset to produce “realistic” images that artificially grow the dataset size.
  2. Make it more difficult for the network to “memorize” the training data, by creating random “noise”. Some examples of transformations including blurring and changes in perspective would create images that would not be representative of “in the wild” images, but might cause the network to “have to think more” and not “memorize” the training set.

I’m wondering which of the 2 reasons or both reasons are generally understood to be the reason for using image augmentation. I ask because some transformations are interesting, such as blurring, changes in brightness and so on. But in my use case blurred images for example would not be found “in the wild”, and so I’m wondering if it’s appropriate to use it.

I always assumed the data augmentation should fill the “real data domain” with randomly transformed samples, i.e. the transformations should stick close to what you would expect to see in the wild (but I might not be up-to-date on this topic).