Normalization in custom Dataset class

soloupis · July 20, 2019, 3:02pm

Hello fellow Pytorchers,

I am trying to add normalization to the custom Dataset class Pytorch provides inside this tutorial.

The problem is that it gives always the same error:

TypeError: tensor is not a torch image.

As you can see inside ToTensor() method it returns:

return {‘image’: torch.from_numpy(image),‘masks’: torch.from_numpy(landmarks)} so I think it returns a tensor already.

I give you my code:

class Rescale(object):
    """Rescale the image in a sample to a given size.

    Args:
        output_size (tuple or int): Desired output size. If tuple, output is
            matched to output_size. If int, smaller of image edges is matched
            to output_size keeping aspect ratio the same.
    """

    def __init__(self, output_size):
        assert isinstance(output_size, (int, tuple))
        self.output_size = output_size

    def __call__(self, sample):
        image, landmarks = sample['image'], sample['masks']

        h, w = image.shape[:2]
        if isinstance(self.output_size, int):
            if h > w:
                new_h, new_w = self.output_size * h / w, self.output_size
            else:
                new_h, new_w = self.output_size, self.output_size * w / h
        else:
            new_h, new_w = self.output_size

        new_h, new_w = int(new_h), int(new_w)

        img = transform.resize(image, (new_h, new_w))

        # h and w are swapped for landmarks because for images,
        # x and y axes are axis 1 and 0 respectively
        #landmarks = landmarks

        return {'image': img, 'masks': landmarks}


class ToTensor(object):
    """Convert ndarrays in sample to Tensors."""

    def __call__(self, sample):
        image, landmarks = sample['image'], np.array(sample['masks'])

        # swap color axis because
        # numpy image: H x W x C
        # torch image: C X H X W
        image = image.transpose(2,0,1)
        return {'image': torch.from_numpy(image),'masks': torch.from_numpy(landmarks)}
      
transformed_train_dataset = MasksTrainDataset(csv_file='pneumo_input/train/train-rle.csv',
                                           root_dir='pneumo_input/train/images/256/dicom/',
                                           transform=transforms.Compose([
                                               Rescale(224),
                                               ToTensor(),
                                               transforms.Normalize(mean=[0.5, 0.5, 0.5],std=[0.5, 0.5, 0.5])
                                           ]))

for i in range(len(transformed_train_dataset)):
    sample = transformed_train_dataset[i]

    print(i, sample['image'].size(), sample['masks'])

    if i == 3:
        break
        
train_dataloader = DataLoader(transformed_train_dataset, batch_size=4,
                        shuffle=True, num_workers=4)

I am using grayscale images converted to RGB.

Thank you in advance!

Nikronic · July 20, 2019, 3:58pm

Hi,

It is about the code you have implemented in __getitem()__ method in your MasksTrainDataset. Can you post how you return an item of your dataset using this method?

soloupis · July 20, 2019, 5:38pm

@Nikronic

Check this out:

class MasksTrainDataset(Dataset):
    """Masks Train dataset."""

    def __init__(self, csv_file, root_dir, transform=None):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """
        self.masks_frame = pd.read_csv(csv_file, skiprows=1500, nrows=10000)
        self.root_dir = root_dir
        self.transform = transform

    def __len__(self):
        return len(self.masks_frame)

    def __getitem__(self, idx):
        img_name = os.path.join(self.root_dir,
                                self.masks_frame.iloc[idx, 0])
        image = io.imread(img_name + '.png')
        image = cv2.cvtColor(image,cv2.COLOR_GRAY2RGB)
        ## use strip to get exact result
        masks = self.masks_frame.iloc[idx, 1].strip()
        if masks == '-1':
            mark = 0
        else:
            mark = 1
            
        #masks = np.array([masks])
        #masks = masks.astype('float').reshape(-1, 2)
        sample = {'image': image, 'masks': mark}

        if self.transform:
            sample = self.transform(sample)

        return sample
      
      
train_dataset = MasksTrainDataset(csv_file='pneumo_input/train/train-rle.csv',
                                    root_dir='pneumo_input/train/images/256/dicom/')
print(len(train_dataset))
for i in range(len(train_dataset)):
    sample = train_dataset[i]

    print(i, sample['image'].shape, sample['masks'])
    print(type(sample['masks']))

    if i == 4:
        plt.show()
        break

What do you think?

Nikronic · July 20, 2019, 6:04pm

OK,
It seems your image and masks are CV2 objects. Pytorch’s image backend is Pillow if you want to do some transformation on it. And as you can see in ToTensor class, it expects numpy array or PIL image. So you can solve this issue by converting your image and masks to numpy or Pillow image in __getitem()__.

I have not tried it by np.array(your image or mask) should do the job.

soloupis · July 20, 2019, 6:13pm

Thank you I will definitelly try it.

soloupis · July 20, 2019, 7:16pm

@Nikronic

It seems that I cannot make it work. I have to use a method to turn one channel of grayscale image to 3 channel (RGB).I thought I have managed it with CV but I had problems with the normalize function. I used:

image = Image.open(img_name + ‘.png’).convert(‘RGB’)

but then it raises other shape related errors.

Do you think there is an error at the above code instead use CV?

Nikronic · July 20, 2019, 7:38pm

Actually, your problem should not be CV or PIL, because if you provide a numpy, they will have the same result sometimes.

Here your code to convert to RGB is correct and PIL just duplicate the gray channel twice and concatenate them to make it 3 channel image.

Try this code and please print errors (it is hard to track without having errors):

import numpy as np

print(np.array(image = Image.open(img_name + ‘.png’)).shape)

soloupis · July 20, 2019, 8:55pm

@Nikronic
I changed everything to below code:

class MasksTrainDataset(Dataset):
    """Masks Train dataset."""

    def __init__(self, csv_file, root_dir, transform=None):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """
        self.masks_frame = pd.read_csv(csv_file, skiprows=1500, nrows=10000)
        self.root_dir = root_dir
        self.transform = transform

    def __len__(self):
        return len(self.masks_frame)

    def __getitem__(self, idx):
        img_name = os.path.join(self.root_dir,
                                self.masks_frame.iloc[idx, 0])
        
        image = Image.open(img_name + '.png')
        ## use strip to get exact result
        masks = self.masks_frame.iloc[idx, 1].strip()
        if masks == '-1':
            mark = 0
        else:
            mark = 1
            
        #masks = np.array([masks])
        #masks = masks.astype('float').reshape(-1, 2)
        sample = {'image': np.array(image), 'masks': np.array(mark)}

        if self.transform:
            sample = self.transform(sample)

        return sample
      
      
train_dataset = MasksTrainDataset(csv_file='pneumo_input/train/train-rle.csv',
                                    root_dir='pneumo_input/train/images/256/dicom/')
print(len(train_dataset))
for i in range(len(train_dataset)):
    sample = train_dataset[i]

    print(i, sample['image'].shape, sample['masks'])
    print(type(sample['masks']))

    if i == 4:
        plt.show()
        break

and Transforms:

class Rescale(object):
    """Rescale the image in a sample to a given size.

    Args:
        output_size (tuple or int): Desired output size. If tuple, output is
            matched to output_size. If int, smaller of image edges is matched
            to output_size keeping aspect ratio the same.
    """

    def __init__(self, output_size):
        assert isinstance(output_size, (int, tuple))
        self.output_size = output_size

    def __call__(self, sample):
        image, landmarks = sample['image'], sample['masks']

        h, w = image.shape[:2]
        if isinstance(self.output_size, int):
            if h > w:
                new_h, new_w = self.output_size * h / w, self.output_size
            else:
                new_h, new_w = self.output_size, self.output_size * w / h
        else:
            new_h, new_w = self.output_size

        new_h, new_w = int(new_h), int(new_w)

        img = transform.resize(image, (new_h, new_w))

        # h and w are swapped for landmarks because for images,
        # x and y axes are axis 1 and 0 respectively
        #landmarks = landmarks

        return {'image': np.array(img), 'masks': np.array(landmarks)}


class ToTensor(object):
    """Convert ndarrays in sample to Tensors."""

    def __call__(self, sample):
        image, landmarks = np.array(sample['image']), np.array(sample['masks'])

        # swap color axis because
        # numpy image: H x W x C
        # torch image: C X H X W
        image = image.transpose(2,0,1)
        #print(image.shape)
        #normalize = transforms.Normalize(mean=[0.5, 0.5, 0.5],std=[0.5, 0.5, 0.5])
        #return {'image': torch.from_numpy(image).unsqueeze(0),'masks': torch.from_numpy(landmarks)}
        #print(torch.is_tensor(torch.from_numpy(image)) and torch.from_numpy(image).ndimension() == 3)
        return {'image': torch.from_numpy(image),'masks': torch.from_numpy(landmarks)}
      
transformed_train_dataset = MasksTrainDataset(csv_file='pneumo_input/train/train-rle.csv',
                                           root_dir='pneumo_input/train/images/256/dicom/',
                                           transform=transforms.Compose([
                                               Rescale(224),
                                               ToTensor(),
                                               transforms.Normalize(mean=[0.5, 0.5, 0.5],std=[0.5, 0.5, 0.5])
                                           ]))

for i in range(len(transformed_train_dataset)):
    sample = transformed_train_dataset[i]

    print(i,sample['image'].size(),sample['masks'])

    if i == 3:
        break
        
train_dataloader = DataLoader(transformed_train_dataset, batch_size=4,
                        shuffle=True, num_workers=4)

Still the same error:

----------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-98-710d244d9279> in <module>()
     61 
     62 for i in range(len(transformed_train_dataset)):
---> 63     sample = transformed_train_dataset[i]
     64 
     65     print(i,sample['image'].size(),sample['masks'])

3 frames
<ipython-input-96-283923ce157c> in __getitem__(self, idx)
     37 
     38         if self.transform:
---> 39             sample = self.transform(sample)
     40 
     41         return sample

/usr/local/lib/python3.6/dist-packages/torchvision/transforms/transforms.py in __call__(self, img)
     59     def __call__(self, img):
     60         for t in self.transforms:
---> 61             img = t(img)
     62         return img
     63 

/usr/local/lib/python3.6/dist-packages/torchvision/transforms/transforms.py in __call__(self, tensor)
    162             Tensor: Normalized Tensor image.
    163         """
--> 164         return F.normalize(tensor, self.mean, self.std, self.inplace)
    165 
    166     def __repr__(self):

/usr/local/lib/python3.6/dist-packages/torchvision/transforms/functional.py in normalize(tensor, mean, std, inplace)
    199     """
    200     if not _is_tensor_image(tensor):
--> 201         raise TypeError('tensor is not a torch image.')
    202 
    203     if not inplace:

TypeError: tensor is not a torch image.

What do you think?

soloupis · July 21, 2019, 6:07am

@Nikronic

I think the problem is because ToTensor custom method returns a dictionary.
I changed code to below:

transformed_train_dataset = MasksTrainDataset(csv_file='pneumo_input/train/train-rle.csv',
                                           root_dir='pneumo_input/train/images/256/dicom/',
                                           transform=transforms.Compose([
                                               Rescale(224),
                                               ToTensor(),
                                               transforms.Lambda(lambda x: x['image'].repeat(1, 1, 1)),
                                               transforms.Normalize(mean=[0.5, 0.5, 0.5],std=[0.5, 0.5, 0.5]),
                                           ]))

and normalization works. BUT now with Lambda function I lose labels (x[‘masks’]).

I found where is the problem though. So I have to normalize image before returning a dictionary at ToTensor custom method.

soloupis · July 21, 2019, 6:46am

@Nikronic

Final and working

class ToTensor(object):
    """Convert ndarrays in sample to Tensors."""

    def __call__(self, sample):
        image, landmarks = sample['image'], np.array(sample['masks'])

        # swap color axis because
        # numpy image: H x W x C
        # torch image: C X H X W
        image = image.transpose(2,0,1)
        print(image.shape)
        image = torch.from_numpy(image).float()
        
        in_transform = transforms.Compose([transforms.Normalize([0.5, 0.5, 0.5],[0.5, 0.5, 0.5])])
        ## discard the transparent, alpha channel (that's the :3)
        image = in_transform(image)[:3,:,:]
        
               
        return {'image': image,'masks': torch.from_numpy(landmarks)}
        #return torch.from_numpy(image).float()
      
transformed_train_dataset = MasksTrainDataset(csv_file='pneumo_input/train/train-rle.csv',
                                           root_dir='pneumo_input/train/images/256/dicom/',
                                           transform=transforms.Compose([
                                               Rescale(224),
                                               ToTensor(),
                                           ]))

for i in range(len(transformed_train_dataset)):
    print(type(transformed_train_dataset))
    print(len(transformed_train_dataset))
    sample = transformed_train_dataset[i]
    

    print(i,sample['image'],sample['masks'])

    if i == 3:
        break
        
train_dataloader = DataLoader(transformed_train_dataset, batch_size=4,
                        shuffle=True, num_workers=4)

Nikronic · July 21, 2019, 9:59am

Yes you right, you should not return a dictionary in ToTensor or any of Transforms class.
Sorry if I answered late (time zone differences!).

But I have a suggestion here. It is better to build your classes modular so you can use them in other tasks with different datasets easily. For instance, maybe you need 3 or 4 images to be transformed or using different transforms on them. In this case you have to edit your ToTensor or Rescale class. So I think it is better to implement all transform classes for only a sample of input, actually, this is the approach has been chosen in PyTorch.

If I want to explain scenario, I can say if want to do other transforms for example adding gaussian noise to your image not landmarks, you will be stuck again and you have change your ToTensor code because still you are returning dictionary or even you are using another transform inside another one. But if your classes only take one tensor as input and return the changed tensor, you can use all of your custom classes in any order or in any dataset you want.

By the way, I use same approach as pytorch so I really did not think about your ToTensor custom implementation.

usage in preprocessing step

github.com

Nikronic/CoarseNet/blob/master/utils/preprocess.py#L98-L101


      
          if self.transform is not None:
              X = self.transform(X)
              random.seed(seed)
              y_descreen = self.transform_gt(y_descreen)

usage in DataLoaders

github.com

Nikronic/CoarseNet/blob/master/Train.py#L147-L153


      
          # %% get dataset specific mean and std values
          train_dataset = PlacesDataset(txt_path=args.txt,
                                        img_dir=args.img,
                                        transform=ToTensor(),
                                        test=True)
          
          mean, std = OnlineMeanStd()(train_dataset, batch_size=1, method='strong')

Custom Transform

github.com

Nikronic/CoarseNet/blob/master/utils/preprocess.py#L109-L119


      
          class RandomNoise(object):
              def __init__(self, p, mean=0, std=0.1):
                  self.p = p
                  self.mean = mean
                  self.std = std
          
              def __call__(self, img):
                  if random.random() <= self.p:
                      noise = torch.empty(*img.size(), dtype=torch.float, requires_grad=False)
                      return img+noise.normal_(self.mean, self.std)
                  return img

Good luck