How to avoid image normalization by torchvision.transforms.ToTensor

Ali_Aghababaei · August 11, 2022, 7:08pm

Hi everyone,
I have created a custom dataset for some medical images and wanted to use an ImageNet-based pre-trained model like vgg16 for feature extraction. Then, I will design a custom fully-connected network and use Sklearn StratifiedKfold cross-validation to train and test the model.
Here is the code for the dataset:

 def __init__ (self, transform, kfold_index, img_path = '/content/drive/MyDrive/projection images', 
                img_data = pd.read_csv ('/content/drive/MyDrive/projection images/images.csv'), 
                img_files = os.listdir ('/content/drive/MyDrive/projection images /projection images)):
    super ().__init__ ()
    self.transform = transform
    self.img_path = img_path
    self.img_label = [img_data.iloc [kfold_index [i], 7] for i in range (len (kfold_index))]
    self.img_files = [img_files [kfold_index [i]] for i in range (len (kfold_index))]
    
  def __len__ (self):
    return len (self.img_files)
  def __getitem__(self, index):
    sample_path = os.path.join (self.img_path, self.img_files [index])
    img = Image.open (sample_path)
    img = self.transform (img)
    
    return img, torch.tensor (self.img_label [index])

Also, the torchvision transformations for train and test datasets are as follows:

test_transform = transforms.Compose ([transforms.Resize (256),
                                      transforms.CenterCrop (224),
                                      transforms.Grayscale (num_output_channels= 3)])
                                      transforms.ToTensor ()])
train_transform = transforms.Compose ([transforms.RandomResizedCrop (256, scale = (0.8, 1)),
                                      transforms.ColorJitter (),
                                      transforms.CenterCrop (224),
                                      transforms.RandomAffine (15),
                                      transforms.RandomHorizontalFlip (),
                                      transforms.Grayscale (num_output_channels= 3)])
                                      transforms.ToTensor ()])

However, in previous works of my colleagues on the same dataset, the images were not normalized/standardized in the classic ways, i.e., all the images were just divided by a number (claculated based on the range of image values), not using min-max normalization or not standardizing by making the mean to zero and std to 1. In those studies, the models worked very well, and thus, I want to use the same approach in my work.
The problem is that when using transforms.ToTensor (), Pytorch automatically normalizes the images. My question is that what is the best way to prevent such normalization?
should I change transforms.ToTensor with transforms.PILToTensor ()?
when I do that, this error is raised:

RuntimeError: Input type (torch.cuda.ByteTensor) and weight type (torch.cuda.FloatTensor) should be the same

or should I delete transforms.ToTensor from the transforms.Compose and instead, in the “return” part of the custom dataset, I change :

return img, torch.tensor (self.img_label [index])

with
1)

return torch.tensor  (img), torch.tensor (self.img_label [index])```

or
2)

return torch.Tensor (img), torch.tensor (self.img_label [index])

Actually, other errors were again raised when I did these changes:

Could not infer dtype of Image
new(): data must be a sequence (got Image)

I would be grateful if you could tell me what is the best approach in my case.

eqy · August 11, 2022, 7:22pm

The simplest thing to do is probably either write your own ToTensor that calls a different function (see the function that is currently used here: torchvision.transforms.functional — Torchvision main documentation) or to add a transformation after ToTensor that effectively undoes the normalization (e.g., by multiplying by a range and adding the mean back) as you should know the normalization constants that were used.

Ali_Aghababaei · August 11, 2022, 8:02pm

Thanks a lot for your response.
Actually, despite reading the source code link you provided, I cannot understand how to_tensor function changes the range [0, 255] to [0, 1]. Would you please explain it a bit more?
Also, as I do not know how ToTensor normalizes the image, I cannot undo it. For instance, in min-max normalization, I should know the maximum and minimum of every image value to be able to unnormalize it (regarding your second recommendation).

eqy · August 11, 2022, 8:07pm

I believe ToTensor assumes an 8-bit value with a maximum possible of 255 for each channel, so it simply devices by this maximum rather than explicitly computing the maximum in your dataset. You can see this behavior in the .div calls in the linked code.

You can also try converting to tensors directly without normalization if you can remove it from the Compose, which I realize would probably be simpler than modifying those functions. e.g., with something like

    # handle PIL Image
    img = torch.as_tensor(np.array(pic, copy=True))
    img = img.view(pic.size[1], pic.size[0], len(pic.getbands()))
    # put it from HWC to CHW format
    img = img.permute((2, 0, 1))
    return img

Ali_Aghababaei · August 12, 2022, 12:36pm

I really appreciate your explanation.
In your code example, I think something like dtype = torch.float32 should be added in the line img = torch.as_tensor(np.array(pic, copy=True)). Is that okay?
I also wonder what the use of
img = img.view(pic.size[1], pic.size[0], len(pic.getbands())) is, since in my transformations I have used transforms.Grayscale (num_ouput_channels = 3) which will produce images with the shape (224, 224, 3).