Understanding transform.Normalize( )

Thank you. Do you know why I am getting a tensor of 600 values instead of 3 values for train_loader? As for dataloader, I exactly get 3 values I get a tensor of 600 values instead of 3 values for mean and std

isnt the maximum value in each channel 255 ?
im assuming we didn’t divide each pixel by 255

It depends on how you want to apply transform.Normalize().
Usually, transform.ToTensor() will make the pixel values to be between [0, 1].
transform.Normalize() is supposed to work on these tensors.

1 Like

Thanks for your contribution.

In my case, I have CT images with min and max Hounsfield intensity values of -1024 and 3597 which I scaled to be within [0,1]. From the discussion so far, I realized that there is a need to normalize for better performance. My question is, do I need to do this for the validation and testing dataset? If yes, can I use the computed mean and standard deviation of the training dataset or I compute separately for the validation and testing dataset?

1 Like

I don’t think this generally applicable to any given grayscale/RGB channel images(s). You need to compute your mean and standard deviation of your dataset

You should use the training statistics, as otherwise you might leak the validation and/or test dataset information into the training.

2 Likes

how can you get the values in tensor all in range of [0, 1] by using image = torch.randint(0,255,(5, 5, 3), dtype=torch.uint8). Doesn’t that produce values in range [0,255]? It does not seem to be right.

Hi,

The normalization is usually applied on the images. What if we have masks associated with the images as well in that case the normalization is getting applied on the mask images and I get the following error:
RuntimeError: output with shape [1, 512, 512] doesn’t match the broadcast shape [3, 512, 512]
This is because my masks are in grayscale but the images are in RGB. I want to apply the same transformations on both the images as well as the masks except the Normalization. Any help in this regard will be highly appreciated. Thanks

The error is raised, since the number of channels in the mask doesn’t match the provided mean and std in the Normalization transformation.

You could apply the Normalization only to the data tensor and skip it for the mask.
E.g. in case you are passing a transform object to the Dataset, remove the Normalize transformation from it and either apply it inside the Dataset, if you are using a custom Dataset implementation, or check if your current Dataset accepts a target_transform argument.

Let me know, if this helps.

Hi @InnovArul

In getting the mean and standard deviation of the training dataset, is the computation of np.mean(training_data_array) and np.std(training_data_array) the same as the batched mean and statistics from your dataloader?

If not, how will I get the batch mean and standard deviation in the dataloader when the same transform.Normalize() is mine dataloader?

How the process will be?
something like image_norm = image / 255 or something like that?
Thank you

Hi there,

The reason of code transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5) is to convert range from [0;1] to [-1, 1]. Because your image is loaded from PILImage, it is in range of [0, 1] by defautl. In here, author of the code want to make use of standardization formular for adjusting the range of value. However, the purpose of this code is not standardization. For example,

If you have a pixel with value 0, its conversion will be:
(0-0.5)/ 0.5 = -1

If you have a pixel with value 1, its conversion will be:
(1-0.5)/0.5 = 1

Remind about standardization formular:
(data_point_value - mean) / std

By putting mean =0.5 and std = 0.5, we can make use of existing transform function of pytorch for conversion.

Hi @ptrblck, could you please answer me?
Images’ type after applying transformed.Normalize() become float. My masks are binary images( 0 and 1,uint8). Should I change the masks’ type to float as well, even though they are only 0 and 1 matrix? Also, in general, which type is better in image processing for image classification, segmentation or detection? Float or int?

Yes, that’s expected as the default input type is float32 and Normalize creates input tensors with a zero mean and a unit variance.

No, this sounds wrong. I assume you want to use your mask in a multiplication to “mask” specific values and are thus depending on the 1s and 0s. In this case, don’t use Normalize on it.
If that’s not the case, let me know how you are using these masks in your model.

As often, it depends on your use case. If you are purely working on image processing you might want to keep the image in an integer type and manipulate it directly. However, if you want to train e.g. a neural network using these images, floating point inputs are usually the way to go since (all) math ops in your model will be using floating point tensors so that you can train them. Also, normalizing the inputs usually helps in the model training.

1 Like

Thanks for the quick response. I am working on medical images and doing tumour segmentation and detection. My masks are actually the output of my model that my model tries to detect them from input imags. In fact, the input images are float images that I applied normalization on them, and the output are binary masks(uint8) that I have not applied normalization on them. My question is whether should the type of input images and masks be the same? Does it matter or not? (during training, masks used in loss calculation).

It depends on the use case and which criterion you are using.
Since your masks contain 0s and 1s I assume you are working on a binary segmentation use case.
In this case, do not normalize the masks to keep the 0s and 1s, and transform the masks to FloatTensors via mask = mask.float() to be able to use nn.BCEWithLogitsLoss as the loss function.

1 Like

Alright, thanks for your help :blush:

transforms.ToTensor() rescales the data to [0,1]. You might want to read this. for a better understanding of normalising vs scaling the data.

what is the difference between X/= X.norm() and F.normalize(X)?