Forgive me if I misunderstand this operator. Here is the transform that I am applying:
output_size = 256
color_transform = transforms.Compose([
transforms.ToTensor(),
transforms.Resize((output_size, output_size), antialias=True),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
depth_transform = transforms.Compose([
transforms.ToTensor(),
transforms.Resize((output_size, output_size), antialias=True),
transforms.Normalize(mean=[0.5], std=[0.5])
])
I suppose that the output value of tensor will be in the range [mean-std, mean+std], but the value that I print out is much larger/smaller. What did I do wrong?
Some context about my case. I am trying to read RGB and depth images with cv2.imread(filepath, -1) then applying the above transformation directly.
transforms.Normalize
subtracts the provided mean
value from the tensor and divides by std
to create a normalized sample. The values are calculated in a way to create a zero-mean and unit-variance output. The actual values are not bound to [mean-std, mean+std]
as seen in this example:
x = torch.randn(10000) * 123. + 5467
print(x.mean(), x.std())
# tensor(5469.2778) tensor(122.1908)
mean = x.mean()
std = x.std()
y = (x - mean) / std
print(y.mean(), y.std())
# tensor(-2.7153e-06) tensor(1.)
print(y.min(), y.max())
# tensor(-3.7724) tensor(3.4398)
1 Like
I suppose to get the mean and std as in the formation, I should rescale the tensor value in the range of [0,1] first. Is is right?
Yes, the posted mean
and std
values from the color_transform
are the ImageNet stats using the normalized inputs in the range [0, 1]
while the second example seems like placeholder stats also for normalized inputs in [0, 1]
.
1 Like