MNIST Normalization

1414b35e42c77e0a57dd · June 27, 2019, 4:27am

I’m confused about normalization process to MNIST.
In 64 batch size, I think transforms.Normalize((0.1307), (0.3081) is right.

In my case, I already do train_dataset.train_data.float() / 255. in MNIST class so I can get (0.1307),(0.3081).
then do not need to use normalization any more?
This is my code.

class MNIST(data.Dataset):
    def __init__(self, split, transform=None):
        # self.transform = transformers.get_basic_transformer()
        self.split = split
        self.transform = transform

        if split == "train":

            train_dataset = datasets.MNIST(
                root="data/mnist", train=True, download=True, transform=None)
            self.X = train_dataset.train_data.float() / 255.
            self.y = train_dataset.train_labels

            np.random.seed(2)
            ind = np.random.choice(
                len(train_dataset), len(train_dataset), replace=False)
            self.X = self.X[ind]
            self.y = self.y[ind]

        elif split == "val":
            test_dataset = datasets.MNIST(root="data/mnist",
                                          train=False, download=True, transform=None)

            self.X = test_dataset.test_data.float() / 255.
            self.y = test_dataset.test_labels

    def __getitem__(self, index):
        X, y = self.X[index].clone(), self.y[index].clone()

        X = self.transform(X[None])

        return X, y

    def __len__(self):
        """Return size of dataset."""
        return self.X.shape[0]

    pre_process = transforms.Compose(
        [transforms.Normalize(mean=[0.5], std=[0.5])]) DO I HAVE TO DO THIS ?

What about test dataset? Should I re-calculate mean and std from test data?

ptrblck · June 27, 2019, 10:21am

transforms.Normalize will standardize the data such that it’ll have a zero mean and unit variance.
You don’t need to apply it, but it might help your training.

No, you should apply the same statistics calculated from the training dataset.
Otherwise this would be considered as data leakage.

1414b35e42c77e0a57dd · June 27, 2019, 10:52am

@ptrblck
Thanks a reply !

My questions was not clear.
What I want to ask you is that train_dataset.train_data.float() / 255. makes normalization
then, [transforms.Normalize(mean=[0.5], std=[0.5])]) is not needed ?

ptrblck · June 27, 2019, 11:03am

I think your question was clear, but my answer probably wasn’t.

Dividing by 255. will normalize the data to the range [0, 1].
transforms.Normalize will standardize this data to zero mean and unit variance.
So, Normalize might yield advantages for your training, but your model might also learn with fine without it.

Anyway, I would recommend using it.