Thank you. Do you know why I am getting a tensor of 600 values instead of 3 values for train_loader? As for dataloader, I exactly get 3 values I get a tensor of 600 values instead of 3 values for mean and std
isnt the maximum value in each channel 255 ?
im assuming we didnât divide each pixel by 255
It depends on how you want to apply transform.Normalize()
.
Usually, transform.ToTensor()
will make the pixel values to be between [0, 1].
transform.Normalize()
is supposed to work on these tensors.
Thanks for your contribution.
In my case, I have CT images with min and max Hounsfield intensity values of -1024 and 3597 which I scaled to be within [0,1]
. From the discussion so far, I realized that there is a need to normalize for better performance. My question is, do I need to do this for the validation and testing dataset? If yes, can I use the computed mean and standard deviation of the training dataset or I compute separately for the validation and testing dataset?
I donât think this generally applicable to any given grayscale/RGB channel images(s). You need to compute your mean and standard deviation of your dataset
You should use the training statistics, as otherwise you might leak the validation and/or test dataset information into the training.
how can you get the values in tensor all in range of [0, 1] by using image = torch.randint(0,255,(5, 5, 3), dtype=torch.uint8). Doesnât that produce values in range [0,255]? It does not seem to be right.
Hi,
The normalization is usually applied on the images. What if we have masks associated with the images as well in that case the normalization is getting applied on the mask images and I get the following error:
RuntimeError: output with shape [1, 512, 512] doesnât match the broadcast shape [3, 512, 512]
This is because my masks are in grayscale but the images are in RGB. I want to apply the same transformations on both the images as well as the masks except the Normalization. Any help in this regard will be highly appreciated. Thanks
The error is raised, since the number of channels in the mask
doesnât match the provided mean
and std
in the Normalization
transformation.
You could apply the Normalization
only to the data tensor and skip it for the mask.
E.g. in case you are passing a transform
object to the Dataset
, remove the Normalize
transformation from it and either apply it inside the Dataset
, if you are using a custom Dataset
implementation, or check if your current Dataset
accepts a target_transform
argument.
Let me know, if this helps.
Hi @InnovArul
In getting the mean and standard deviation of the training dataset, is the computation of np.mean(training_data_array)
and np.std(training_data_array)
the same as the batched mean and statistics from your dataloader?
If not, how will I get the batch mean and standard deviation in the dataloader when the same transform.Normalize()
is mine dataloader?
How the process will be?
something like image_norm = image / 255 or something like that?
Thank you
Hi there,
The reason of code transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)
is to convert range from [0;1]
to [-1, 1]
. Because your image is loaded from PILImage, it is in range of [0, 1] by defautl. In here, author of the code want to make use of standardization formular for adjusting the range of value. However, the purpose of this code is not standardization. For example,
If you have a pixel with value 0, its conversion will be:
(0-0.5)/ 0.5 = -1
If you have a pixel with value 1, its conversion will be:
(1-0.5)/0.5 = 1
Remind about standardization formular:
(data_point_value - mean) / std
By putting mean =0.5 and std = 0.5, we can make use of existing transform function of pytorch for conversion.
Hi @ptrblck, could you please answer me?
Imagesâ type after applying transformed.Normalize() become float. My masks are binary images( 0 and 1,uint8). Should I change the masksâ type to float as well, even though they are only 0 and 1 matrix? Also, in general, which type is better in image processing for image classification, segmentation or detection? Float or int?
Yes, thatâs expected as the default input type is float32
and Normalize
creates input tensors with a zero mean and a unit variance.
No, this sounds wrong. I assume you want to use your mask in a multiplication to âmaskâ specific values and are thus depending on the 1s and 0s. In this case, donât use Normalize
on it.
If thatâs not the case, let me know how you are using these masks in your model.
As often, it depends on your use case. If you are purely working on image processing you might want to keep the image in an integer type and manipulate it directly. However, if you want to train e.g. a neural network using these images, floating point inputs are usually the way to go since (all) math ops in your model will be using floating point tensors so that you can train them. Also, normalizing the inputs usually helps in the model training.
Thanks for the quick response. I am working on medical images and doing tumour segmentation and detection. My masks are actually the output of my model that my model tries to detect them from input imags. In fact, the input images are float images that I applied normalization on them, and the output are binary masks(uint8) that I have not applied normalization on them. My question is whether should the type of input images and masks be the same? Does it matter or not? (during training, masks used in loss calculation).
It depends on the use case and which criterion you are using.
Since your masks contain 0s and 1s I assume you are working on a binary segmentation use case.
In this case, do not normalize the masks to keep the 0s and 1s, and transform the masks to FloatTensor
s via mask = mask.float()
to be able to use nn.BCEWithLogitsLoss
as the loss function.
Alright, thanks for your help
transforms.ToTensor()
rescales the data to [0,1]. You might want to read this. for a better understanding of normalising vs scaling the data.
what is the difference between X/= X.norm()
and F.normalize(X)
?