How to normalize a tensor to 0 mean and 1 variance?

Hi I’m currently converting a tensor to a numpy array just so I can use sklearn.preprocessing.scale
Is there a way to achieve this in PyTorch? I have seen there is torchvision.transforms.Normalize but I can’t work out how to use this outside of the context of a dataloader. (I’m trying to use this on a tensor during training)

Thanks in advance

1 Like

You could add the normalization in the __getitem__ function of your Dataset:

class MyDataset(Dataset):
    def __init__(self, X, y, transform=None):
        self.data = X
        self.target = y
        self.transform = transform
        
    def __getitem__(self, index):
        x = self.data[index]
        y = self.target[index]

        # Normalize your data here
        if self.transform:
            x = self.transform(x)

        return x, y
    
    def __len__(self):
        return len(self.data)

In this use case, you could set transform to something like this:

transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])
2 Likes

Would this do it?

import torch
from torchvision import transforms

mu = 2
std = 0.5
t = torch.Tensor([1,2,3])
(t - 2)/0.5
# or if t is an image
transforms.Normalize(2, 0.5)(t)

see:

https://pytorch.org/docs/master/torchvision/transforms.html#torchvision.transforms.Normalize

1 Like

Thanks but this one won’t work for my use case , as I am not trying to do this when I load the data, but as part of another calculation that I am performing during training.

1 Like

Yeah I tried this and I always get an error:

" for t, m, s in zip(tensor, self.mean, self.std):
TypeError: zip argument #2 must support iteration"

Oh, the mean and std need to be arrays?

so you cant zip self.mean and self.std if they are sinlge values. zip takes multiple iterables and returns packaged tuples.

means = [self.mean] * tensor.size()[0]
stds = [self.std] * tensor.size()[0]
for t, m, s in zip(tensor, means, stds):
  # do stuff

turn the means and stds into a length n array where n is the length of ‘tensor’ or tensors

3 Likes

Great! I see. thank you very much.

I haven’t figured out how to use transforms.Normalize on input data that is not an image. I get TypeError: tensor is not a torch image. Is there any way to use this method on non-images?

Normalize works on tensors, so the error message might come from another transformation:

norm = transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
x = torch.randn(3, 224, 224)
out = norm(x)
1 Like

That only works because your tensor has the dimensions of an Image. If you look at the documentation, it says torchvision.transforms.Normalize is used to Normalize a tensor image with mean and standard deviation. The argument is described as a

tensor ( Tensor ) – Tensor image of size (C, H, W) to be normalized.

My data is sequence data of dimension torch.Size([4, 589, 4])

2 Likes

Actually, you’re right the error does go away if I get the dimensions right:

norm = transforms.Normalize((30, 30, 30, 30), (20, 25, 30, 35))
x = torch.randn(4, 589, 4)
out = norm(x)

But I don’t think this is applying the normalization correctly. The data from my data loader is shaped [batch_size, seq_length, x_dim] so the scaling should be applied to the last dimension, whereas I think normalize is applying the scaling across the first dimension (the set of image colour maps).

Is it possible to extend/apply the transforms.Normalize to normalize multidimensional tensor in custom pytroch dataset class? I have a tensor with shape (S x C x W x H) and I want to normalize on C dimension.

thanks, and i have a question on how to set mean and std for each channel, are they calculated from dataset?

Yes, you can calculate the mean and std from your training dataset or use some “default” values e.g. from ImatgeNet.

Is there a way to apply different transforms to the mask vs input? For example, I want to apply all deformation transforms to both, but i only want to normalize and totensor the predicted masks (not target)

I would recommend to use the functional API for these use cases, as it allows you to apply the same “random” transformation on the data and target, and can also be used to call some transformations on one of these tensors separately.
Have a look at this example.

1 Like

Me and @FilipAndersson245 found out that the correct way to unnormalize is:

x * std + mean

We also had to clamp a few values outside of [0,1].

For a single image the code would look something like this:

def inv_normalize(img):
    mean = torch.Tensor([0.485, 0.456, 0.406]).unsqueeze(-1)
    std= torch.Tensor([0.229, 0.224, 0.225]).unsqueeze(-1)
    img = (img.view(3, -1) * std + mean).view(img.shape)
    img = img.clamp(0, 1)
    return img

Feel free to help if the code can be written in a simpler way!

1 Like

Hi @ptrblck, I am also trying to do transform.Normalize(mean, std) outside data-loader but somewhere in the training process. I am not sure how would I do this for a batch of images.

Also, I am using F.normalize(tensor, p=1, dim=1) inside my model. Now, If I am loading the data with transforms.Normalize(mean, std) does it mean I am applying the same Normalization twice?

I saw the source for transforms.Normalize and it appears to be using F.normalize(tensor, self.mean, self.std, self.inplace) which I am not sure is the same thing or different.

To apply transforms.Normalize on a batch you could either run this transformation in a loop on each input or normalize the data tensoe manually via:

x = (x - mean) / std

Inside transforms.Normalize the torchvision.transforms.functional API will be used as F.normalize.
This is not the same methods as torch.nn.functional.normalize and will accept different input arguments.

5 Likes