Understanding transform.Normalize( )

It really helps, thanks~

So if we encounter grayscale images, we will use

transforms.Normalize([0.5], [0.5])

and if we encounter RGB( 3 channels ) , we will use the following

transforms.Normalize(mean=[0.485, 0.456, 0.406],
                     std=[0.229, 0.224, 0.225]) 

here is my implementation of the custom ToTensor() and Normlization() method, which is called ToTensor_Norm() .

class ToTensor_Norm(object):
    """Convert ndarrays in sample to Tensors."""
    def __init__(self, mean, std):
        self.mean = mean
        self.std = std

    def __call__(self, sample):
        image, pls, tr_angles, positions = \
            sample['image'], sample['pls'], sample['tr_angles'], sample['positions']
        image = image[np.newaxis, :, :]
        # swap color axis because
        # numpy image: H x W x C
        # torch image: C X H X W
        # image = image.transpose((2, 0, 1)) # single channel
        image = torch.from_numpy(image)
        for t, m, s in zip(image, self.mean, self.std):

        return {'image': image,
                'pls': torch.from_numpy(pls),
                'tr_angles': torch.from_numpy(tr_angles),
                'positions': torch.from_numpy(positions)

transformed_dataset = PLDataset(csv_file=csv_file,
                                    transform = transforms.Compose([ToTensor_Norm([5.50180734], [8.27773514])]),
                                    # transform=transforms.Compose([ToTensor()])

I was also having same doubt…i am learning pytorch . Normalise depends on the number of channels. if MNIST its grey scale with only one channel . so you can do …transforms.Normalize((0.5,), (0.5,))… If three channel, you may need to specify for all channels for example : CIFAR10.

You have to divide by 255 and then you can proceed with transforms.Normalize()

1 Like

Just note that you need to use your own mean and std if your dataset is not similar to ImageNet. In the 3-channel case you have mentioned, you are using mean and std from ImageNet which works for most of the datasets that are similar but if you are using datasets such as medical image processing, then you need to obtain proper mean and std regarding your own dataset.


@bhushans23 , @InnovArul
In this case we are transforming from [0,1] to [-1,1] using normalization.

Normalization usually however means to subtract each data point with the dataset mean, and then divide by the datasets standard deviation. In our case if you were to consider the dataset to be 11 numbers from [0,1], i.e (0.0, 0.1, 0.2, ....0.9, 1.0) its mean=0.5 but its stddev=0.316

We use transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]). That is mean=0.5, stddev=0.5 for all three channels.

Can someone please explain as to how exactly did we get to these numbers and what would one do if the image is in the range(0,255)?


There is main difference here. If you use mean=0.5 and std=0.5, your output value will be between [-1, 1] based on the normalization formula : (x - mean) / std which also called z-score. This score can be out of [-1, 1] when used mean and std of dataset, for instance ImageNet.
The definition says that we need to use population mean and std but it is usually unavailable, sample mean/std can be used as an estimation.


thats helpful, thanks

I have seen that in training the MNIST dataset, we use transforms.Normalize((0.1307,), (0.3081,)),
My understanding is we calculate mean of dataset and subtract it from each image,

People directly use the values in their codes, but there is no calculation how these are derived.

Also is there is a way that we can automate this, instead of hard-coding, it calculates the mean of dataset and subtracts it.

Does using batch norm as the first layer of our network would have a similar effect?

1 Like

See here the answer by Sowmith himself. I hope thats what your are looking for Normalization in the mnist example

Is it correct or all values should be positive ?
mean=[-0.16160992, -0.09780669, 0.44261688]
std = [1.3066291, 1.3798972, 1.4423286]

It depends on your dataset, and if the mean of all samples is negative (which might be the case), then these values look alright.

EDIT: Just to avoid confusion: if you are working with images, which are using uint8 pixel values, the mean should be positive, since these values cannot get negative values. However, for any other dataset the mean might be whatever makes sense. :wink:


The transformation to [-1,1] is performed to keep values center around 0. This helps in faster convergence.

How about if the values not within range [-1, 1] after normalize, I check the max and min value and out of that range. My transformation:

transforms.Compose([transforms.ToTensor(), transforms.Normalize(mean=[0.35, 0.35, 0.35], std=0.12, 0.12, 0.12)])

Normalize does not necessarily create output values in the range [-1, 1], but the “standard score” as explained by @Nikronic in this post.

Do you need the output values to be in a specific range?

1 Like

Hi, I am a newbie in Machine learning. I still wonder why maximum is 1 and minimum is 0?

1 Like

The whole dataset has divided by 255.

1 Like

Is there a script or piece of code to run on my own images to get the corresponding values similar to below to pass to Normalize method?

transforms.Normalize(mean=[0.485, 0.456, 0.406],
                     std=[0.229, 0.224, 0.225]) 

You could use an approach posted e.g. in this thread.

1 Like