Torchvision.transforms on RGB image in customized Dataset

banikr · June 30, 2021, 12:31am

Following the example here, trying to implement transforms in customized Dataset.

from torchvision import transforms as T
normalize = T.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225]) # from ImageNet

t = T.Compose([
               T.RandomResizedCrop(224),
               T.RandomHorizontalFlip(),
               T.ToTensor(),
               normalize
              ])

Since my RGB image size is huge I am tiling the images to before applying transforms in Custom_dataset.

def __getitem__(self, index):                                  
    im = Image.open(self.imList[index]).convert('RGB')         
    mk = Image.open(self.mkList[index]).convert('L')           # binary mask
    im = np.asarray(im)                                        
    mk = np.asarray(mk)                                        
    im, mk = patchify(im, mk, [224, 224], [100, 100]) 
    i_t_s = torch.empty([64, 3, 224, 224])     # size of im                
    m_t_s = torch.empty([64, 224, 224])     # size of mk                      
    for l in range(0, len(im)):  # for 64 small image patches
        PIL_i = Image.fromarray(im[l, ...]).convert('RGB')     
        PIL_m = Image.fromarray(mk[l, ...]).convert('L')       
        i_t = self.tform(PIL_i) # [3, 224, 224]                
        m_t = T.ToTensor()(PIL_m)                              
        i_t_s[l, ...] = i_t                  
        m_t_s[l, ...] = m_t                                    
    m_t_o = torch.ones(2, 64, 224, 224)                                  
    m_t_o[1, ...] = m_t_s == 1.                                
    m_t_o[0, ...] = m_t_o[0, ...] - m_t_o[1, ...]              
    m_t_o = torch.transpose(m_t_o, 1, 0) #, 2, 3)              
    return i_t_s, m_t_o # size --> [64, 3, 224, 224], [64, 2, 224, 224]

Quesion:

How do I apply the same transformation in both image and paired mask patch?
T.Normalize must be on Tensor and after T.ToTensor() it changes image values from [0, 1] to some normalized value beyond 0 and 1. What is the way to normalize and then T.ToTensor() to keep the value within the range [0, 1]

N.B. patchify creates paired patches of RGB and binary mask.

eqy · June 30, 2021, 4:47am

You can always apply the same transformation if you specify the parameter and use the functional versions of transformations: torchvision.transforms.functional — Torchvision 0.10.0 documentation (pytorch.org).
ToTensor should already normalize to [0, 1]. torchvision.transforms.transforms — Torchvision 0.10.0 documentation (pytorch.org)

banikr · July 1, 2021, 12:21am

Thanks for replying.
The second ques was to clarify the dilemma. Normalize can not be implemented on a non-Tensor but applying it after ToTensor() changes the value of [0, 1] to [2.6 to -2.11] or something like that.
For example, the one used [here](torchvision.transforms.transforms — Torchvision 0.16 documentation)%2C)

 transform = transforms.Compose([
         transforms.RandomHorizontalFlip(),
         transforms.ToTensor(),
         transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)))

If I use like that and visualize the patch fed to the network as input looks like:

Without normalization:

Visualizing the inputs with the following codes:

    unloader = T.ToPILImage()
    def tensor_to_PIL(tensor):
        image = tensor.cpu().clone()
        image = image.squeeze(0)
        image = unloader(image)
        return image
    #
    # print(x.shape, y.shape, torch.unique(y), torch.max(x), torch.min(x), torch.unique(y[:, 0, ...]))
    Pimage = tensor_to_PIL(x[32, ...]) # x size --> 64, 224, 224, 3 
    plt.imshow(Pimage)