Transform normalization

Forgive me if I misunderstand this operator. Here is the transform that I am applying:

output_size = 256
color_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Resize((output_size, output_size), antialias=True),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
depth_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Resize((output_size, output_size), antialias=True),
    transforms.Normalize(mean=[0.5], std=[0.5])
])

I suppose that the output value of tensor will be in the range [mean-std, mean+std], but the value that I print out is much larger/smaller. What did I do wrong?
Some context about my case. I am trying to read RGB and depth images with cv2.imread(filepath, -1) then applying the above transformation directly.

transforms.Normalize subtracts the provided mean value from the tensor and divides by std to create a normalized sample. The values are calculated in a way to create a zero-mean and unit-variance output. The actual values are not bound to [mean-std, mean+std] as seen in this example:

x = torch.randn(10000) * 123. + 5467
print(x.mean(), x.std())
# tensor(5469.2778) tensor(122.1908)

mean = x.mean()
std = x.std()

y = (x - mean) / std
print(y.mean(), y.std())
# tensor(-2.7153e-06) tensor(1.)
print(y.min(), y.max())
# tensor(-3.7724) tensor(3.4398)
1 Like

I suppose to get the mean and std as in the formation, I should rescale the tensor value in the range of [0,1] first. Is is right?

Yes, the posted mean and std values from the color_transform are the ImageNet stats using the normalized inputs in the range [0, 1] while the second example seems like placeholder stats also for normalized inputs in [0, 1].

1 Like