How the torchvision.transforms works?

isalirezag · June 21, 2018, 3:29pm

So Im confuse here.
I am using the tutorial on the pytorch website.
I am reading the images from the cifar10 and for the initial stage im doing some preprocessing on them.
here i just normalize each channel for batch size of 1 between 0-1, but it does not sound like that it is working…
I am not sure if it is a bug or if im doing a mistake here…
So here is my code:


BatchSize=1;

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.0, 0.0, 0.0), (1.0, 1.0, 1.0))
    ])

TrainSet = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(TrainSet, batch_size=BatchSize,
dataiter = iter(trainloader)
images, labels = dataiter.next()

and here i just wanted to check the values:

print('Images size is: {} and it has min and max of {} and {}, \n\n \
      1st channel is {}\n \
      min and max of the 1st channel is {} and {} \n \n \
      2st channel is {}\n \
      min and max of the 2st channel is {} and {} \n \n \
      3rd channel is {} \n \
      min and max of the 3st channel is {} and {} \n' \
      .format(images.size(), torch.min(images[0,:,:,:]), torch.max(images[0,:,:,:]),\
              images[0,0,:,:], torch.min(images[0,0,:,:]), torch.max(images[0,0,:,:]),\
              images[0,1,:,:], torch.min(images[0,1,:,:]), torch.max(images[0,1,:,:]),\
              images[0,2,:,:], torch.min(images[0,2,:,:]), torch.max(images[0,2,:,:]),\
             ) )

And the results were:

Images size is: torch.Size([1, 3, 32, 32]) and it has min and max of 0.0 and 1.0,
   1st channel is 
0.1647 0.2078 0.2588 … 0.4353 0.4431 0.4353
0.1569 0.2118 0.2078 … 0.4431 0.4392 0.4353
0.1686 0.1922 0.1961 … 0.4275 0.4392 0.4353
… ⋱ …
0.2824 0.2706 0.2745 … 0.3490 0.3647 0.3765
0.2667 0.2667 0.2706 … 0.3961 0.4000 0.4078
0.2588 0.2588 0.2627 … 0.4118 0.4078 0.4000
[torch.FloatTensor of size 32x32]
   min and max of the 1st channel is 0.0549019612372 and 1.0 

   2st channel is 
0.1843 0.2314 0.2863 … 0.5216 0.5333 0.5255
0.1804 0.2392 0.2353 … 0.5333 0.5294 0.5216
0.1922 0.2157 0.2235 … 0.5255 0.5294 0.5216
… ⋱ …
0.2902 0.2784 0.2824 … 0.3725 0.3882 0.4000
0.2784 0.2784 0.2824 … 0.4196 0.4235 0.4314
0.2706 0.2706 0.2745 … 0.4353 0.4314 0.4235
[torch.FloatTensor of size 32x32]
   min and max of the 2st channel is 0.0470588244498 and 1.0 

   3rd channel is 
0.1451 0.1765 0.2235 … 0.7922 0.8039 0.7922
0.1412 0.1843 0.1686 … 0.7765 0.7961 0.8000
0.1529 0.1608 0.1569 … 0.7333 0.7961 0.8078
… ⋱ …
0.3765 0.3686 0.3686 … 0.4745 0.4902 0.5020
0.3529 0.3529 0.3569 … 0.5216 0.5255 0.5333
0.3373 0.3333 0.3412 … 0.5373 0.5333 0.5255
[torch.FloatTensor of size 32x32]
   min and max of the 3st channel is 0.0 and 0.815686285496 

as you can see all three channels are normalized together and have value between 0-1, but the channels separately does not sound like that are normalized.

I thought when I defined it like above it should normalized them between 0-1 separately. Can you please tell me how should i do that if I want each channel separately become normalized.

Thanks

ptrblck · June 21, 2018, 3:53pm

transforms.Normalize normalizes the tensor with its mean and stddev.
See the docs.
If you just want to force your image to be in the range[0, 1], transforms.ToTensor() would be sufficient.

isalirezag · June 21, 2018, 3:57pm

yeah but it is not doing it correct i guess. in the doc it says:

class torchvision.transforms.Normalize(mean, std)[source]
Normalize a tensor image with mean and standard deviation. Given mean: (M1,…,Mn) and std: (S1,…,Sn) for n channels, this transform will normalize each channel of the input torch.*Tensor i.e. input[channel] = (input[channel] - mean[channel]) / std[channel]

But I dont see that is doing it for each channel. I see it is doing it for the whole tensor

ptrblck · June 21, 2018, 4:00pm

The function uses the provided values for mean and std.
In your case, Normalize() won’t do anything, since the mean is set to zero and the std to one for all three channels.
Usually you calculate the mean and std from your training data separately for each channel.
Alternatively, you could take the ImageNet statistics and just use them.

isalirezag · June 21, 2018, 4:07pm

Sorry for miscommunication.
See here:
I want mean 0 and std 2.
So I do this, right?:


transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize([0.0, 0.0, 0.0], [2.0, 2.0, 2.0])
    ])

should not this function gives me the mean 0 and std 2 for each of the channels?

but the output is this:
print('Images size is: {} and it has min and max of {} and {}, \n\n \
      1st channel is {}\n \
      min and max of the 1st channel is {} and {} \n \n \
      2st channel is {}\n \
      min and max of the 2st channel is {} and {} \n \n \
      3rd channel is {} \n \
      min and max of the 3st channel is {} and {} \n' \
      .format(images.size(), torch.mean(images), torch.std(images),\
              images[0,0,:,:], torch.mean(images[0,0,:,:]), torch.std(images[0,0,:,:]),\
              images[0,1,:,:], torch.mean(images[0,1,:,:]), torch.std(images[0,1,:,:]),\
              images[0,2,:,:], torch.mean(images[0,2,:,:]), torch.std(images[0,2,:,:]),\
             ) )

Images size is: torch.Size([1, 3, 32, 32]) and it has min and max of 0.169625080517 and 0.124045148048,
   1st channel is 
0.1490 0.1490 0.1490 … 0.1333 0.1314 0.1275
0.1098 0.1098 0.1118 … 0.1000 0.1000 0.0961
0.0745 0.0745 0.0765 … 0.0725 0.0706 0.0686
… ⋱ …
0.0078 0.0118 0.0118 … 0.0059 0.0020 0.0039
0.0098 0.0118 0.0176 … 0.0020 0.0000 0.0039
0.0784 0.0765 0.0843 … 0.0784 0.0725 0.0765
[torch.FloatTensor of size 32x32]
   min and max of the 1st channel is 0.139309515976 and 0.135508642984 

   2st channel is 
0.1745 0.1745 0.1765 … 0.1588 0.1569 0.1510
0.1392 0.1392 0.1412 … 0.1294 0.1294 0.1216
0.1098 0.1098 0.1098 … 0.1078 0.1059 0.0980
… ⋱ …
0.0235 0.0294 0.0275 … 0.0176 0.0118 0.0118
0.0255 0.0275 0.0353 … 0.0118 0.0098 0.0098
0.0902 0.0902 0.0961 … 0.0843 0.0804 0.0824
[torch.FloatTensor of size 32x32]
   min and max of the 2st channel is 0.162101719105 and 0.123755085234 

   3rd channel is 
0.2588 0.2569 0.2569 … 0.2412 0.2392 0.2216
0.2216 0.2216 0.2216 … 0.2098 0.2118 0.1922
0.1902 0.1882 0.1882 … 0.1863 0.1863 0.1667
… ⋱ …
0.1020 0.1059 0.1059 … 0.0902 0.0843 0.0745
0.0863 0.0863 0.0961 … 0.0686 0.0667 0.0588
0.1275 0.1255 0.1333 … 0.1176 0.1137 0.1098
[torch.FloatTensor of size 32x32]
   min and max of the 3st channel is 0.207464006471 and 0.100518621594 

I am confused

ptrblck · June 21, 2018, 4:37pm

No, the function should use these mean and std values to normalize the data, so that after the normalization the channels of the data would have a mean of zero and a std of one.
This procedure is also known as z-scoring. You can find more information on the Wikipedia page.

isalirezag · June 21, 2018, 4:43pm

oh
So the mean and std that I give to the function is used to give me mean and std of 0 and 1? so those mean and std that i give to the function are not the output mean and std…

Well is there any function that i can define what mean and std i want ? or any function that can normalize each channel to a specific range for me?

ptrblck · June 21, 2018, 4:59pm

Yes! Given that you’ve calculated them from your data.
Normalization of the data usually helps the model training.

To transform your data, you can just scale with the desired std and add the mean:

# Create random input images
mean = 2.
std = 4.
x = torch.randn(100, 3, 24, 24) * std + mean
print('mean: {}, std: {}'.format(x.mean(), x.std()))


# Scale so that mean = 0.5 and std = 2
x = x - x.mean()
x = x / x.std()
print('mean: {}, std: {}'.format(x.mean(), x.std()))

new_mean = 0.5
new_std = 2.
x = x * new_std
x = x + new_mean
print('mean: {}, std: {}'.format(x.mean(), x.std()))

If you want to apply it on images using transforms, have a look at this thread.