Normalization in the mnist example

Russel_Russel · February 12, 2017, 8:50am

In the Examples, why they are using transforms.Normalize((0.1307,), (0.3081,) for the minist dataset? Thanks.

avijit_dasgupta · February 12, 2017, 12:47pm

I think those are the mean and std deviation of the MNIST dataset.

apaszke · February 12, 2017, 1:29pm

@avijit_dasgupta is right. This is the mean and std computed on the training set.

dlmacedo · March 2, 2017, 3:20am

But the PyTorch Tutorial https://github.com/pytorch/tutorials/blob/master/Deep%20Learning%20with%20PyTorch.ipynb says we should always use 0.5 since we are getting PIL images:

# The output of torchvision datasets are PILImage images of range [0, 1].
# We transform them to Tensors of normalized range [-1, 1]
transform=transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
                             ])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, 
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4, 
                                          shuffle=False, num_workers=2)

Why should be any different for MNIST dataset?

Thanks in advance,

David

smth · March 2, 2017, 3:21am

MNIST is not natural images, it’s data distribution is quite different.

dlmacedo · March 2, 2017, 3:37am

What an honor to be replied by you, smth.

But the pytorch imagenet example is also very different from 0.5, 0.5, 0.5.

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])

train_loader = torch.utils.data.DataLoader(
    datasets.ImageFolder(traindir, transforms.Compose([
        transforms.RandomSizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        normalize,
    ])),
    batch_size=args.batch_size, shuffle=True,
    num_workers=args.workers, pin_memory=True)

I guess in the pytorch tutorial we are getting a normalization from a range 0 to 1 to -1 to 1 for each image, not considering the mean-std of the whole dataset.

David

smth · March 2, 2017, 3:39am

Yes. On Imagenet, we’ve done a pass on the dataset and calculated per-channel mean/std. In CIFAR10, I thought that this was unncessary to be introduced to the reader, and we quite often just use 0.5, 0.5, 0.5 on many datasets to rerange them to [-1, +1]. Sorry if this was confusing

dlmacedo · March 2, 2017, 3:41am

Ok.

Thank you very much for your answer.

David

achaiah · May 2, 2017, 4:02pm

Any way you could share the code with which you compute mean/std on the dataset? Do you use the dataset class and iterate over it?

Thanks

padamsethia · May 13, 2017, 3:53pm

Did you figure out the code for calculating the mean and std within pytorch ?

adamvest · June 17, 2017, 12:33am

Should just be able to use the ImageFolder or some other dataloader to iterate over imagenet and then use the standard formulas to compute mean and std. at the channel level E.g., for mean keep 3 running sums, one for the R, G, and B channel values as well as a total pixel count (if you are using Python2 watch for int overflow on the pixel count, could need a different strategy). Then simply divide the running sums by the pixel count

dlmacedo · June 25, 2017, 10:40pm

import argparse
import os
import numpy as np
import torchvision
import torchvision.transforms as transforms

dataset_names = ('cifar10','cifar100','mnist')

parser = argparse.ArgumentParser(description='PyTorchLab')
parser.add_argument('-d', '--dataset', metavar='DATA', default='cifar10', choices=dataset_names,
                    help='dataset to be used: ' + ' | '.join(dataset_names) + ' (default: cifar10)')

args = parser.parse_args()

data_dir = os.path.join('.', args.dataset)

print(args.dataset)

if args.dataset == "cifar10":
    train_transform = transforms.Compose([transforms.ToTensor()])
    train_set = torchvision.datasets.CIFAR10(root=data_dir, train=True, download=True, transform=train_transform)
    #print(vars(train_set))
    print(train_set.train_data.shape)
    print(train_set.train_data.mean(axis=(0,1,2))/255)
    print(train_set.train_data.std(axis=(0,1,2))/255)

elif args.dataset == "cifar100":
    train_transform = transforms.Compose([transforms.ToTensor()])
    train_set = torchvision.datasets.CIFAR100(root=data_dir, train=True, download=True, transform=train_transform)
    #print(vars(train_set))
    print(train_set.train_data.shape)
    print(np.mean(train_set.train_data, axis=(0,1,2))/255)
    print(np.std(train_set.train_data, axis=(0,1,2))/255)

elif args.dataset == "mnist":
    train_transform = transforms.Compose([transforms.ToTensor()])
    train_set = torchvision.datasets.MNIST(root=data_dir, train=True, download=True, transform=train_transform)
    #print(vars(train_set))
    print(list(train_set.train_data.size()))
    print(train_set.train_data.float().mean()/255)
    print(train_set.train_data.float().std()/255)

isalirezag · August 3, 2017, 9:49pm

So How should I know what mean and std should I use to transfer my images to? it is different for MNIST, CIFAR10, and ImageNEt…

Any role that I need to stick with?

Thanks

Jing · August 4, 2017, 4:08am

Just caculate them on the whole datasets like @dlmacedo did.

jdhao · November 14, 2017, 6:47am

The code is not widely applicable, if the training images are not the same size and in image format, you can not use the code to calculate per channel mean and std

Royi · November 18, 2017, 7:58pm

I tired, using transforms.Lambda(), to even try to normalize data per pixel from the whole data set.
For some reason it made results worse though I’d think it would be better strategy.

I wonder about something, Let’s say the first layer is Linear Layer (Fully Connected).
What’s the point in removing the mean from the data, as there is a Bias term is is optimized, wouldn’t it calculate the best term to begin with?

jdhao · November 21, 2017, 2:50am

By normalizing the input, SGD algorithm will work better. If the feature scale is not approximately the same, it will takes longer time to find the minimum.

Royi · November 21, 2017, 6:02am

@jdhao, I wasn’t talking about the scaling, I was talking about the bias term.
Moreover, in the case of images all pixels are within the same range so stuff like normalizing different features units doesn’t apply here.

Put my question differently, after this “Centering” does the Bias of the first layer filter is around 0?

SimonW · November 21, 2017, 6:18am

Training is more stable and faster when parameters are small. As a fact, none of these first order optimization method guarantees finding minimum for arbitrary network (in fact, they can’t even find it for the simple ones). Therefore, although scaling & offsetting is equivalent to scaling the weights and offsetting bias at first linear layer, normalization proves to often give better results.

Moreover, you shouldn’t normalize using every pixel’s mean and std. Since conv is an operation on channels, you should just use each channel’s mean and std.

lkins · January 17, 2018, 3:13am

Do we need tensors to be in the range of [-1,1] or is [0,1] okay? I have my own dataset of RGB images with a range of [0,1]. I manually normalized the dataset but the tensors are still in the range of [0,1]. What is the benefit of transforming the range to [-1,1]?