Data augmentation

I am using the following code to do data augmentation of MNIST:

train_loader =
        datasets.MNIST('../data', train=True, download=True,
                           transforms.Normalize((0.1307,), (0.3081,)),
        batch_size=args.batch_size, shuffle=True, **kwargs)

I have a question about the line transforms.Normalize((0.1307,), (0.3081,)), 0.1307 and 0.3801 are mean and standard deviation of the original MNIST dataset. They should have been changed after those augmentation. So should I use the new mean and deviation to do normalization? Another question: should I do the same augmentation on test set? If not, training with augmentation and test would be from different distribution, right?

The idea of augmenting data is generating ``similar" samples from the data generating distribution possibly to avoid overfitting. The values of mean and std_dev will change however, I believe the difference wouldn’t be significant. You can go ahead and test this hypothesis by computing mean and std_dev after transforms. Also, you apply the same transformation on the test set. Your reasoning is correct in this regard.

1 Like