When the input of a convolutional neural network is a (grayscale) image, it is common practice to scale the pixel values to the range [0,1], for example by using the torchvision.transforms.ToTensor
transform.
Many works then proceed to subtract 0.5 and scale by 2 in order to transform this range to [-1, 1]. This is of course a normalization step. However, both in literature and applications, there are many cases where this last transform is not performed.
What would be the argument for (not) applying this normalization step? Why do some projects incorporate this step in their design and others do not?