Semantic Segmentation: Transforming Training Image + Mask Without Transforming Validation Data

joshuadavidwood · February 12, 2021, 10:09pm

Hi All, I’m currently learning semantic segmentation by working through some common problems. As I understand, you should always apply augmentations only to the training dataset and never to the validation or test dataset. When working on the Data Science Bowl 2018 dataset, which involves segmenting and counting cells, I found that entries seemed to avoid using augmentations (see: Simple Unet - Pytorch | Kaggle). However, I am unsure how to apply transformations when I am using the Dataset class to load data but ensuring that augmentations are only applied a proportion of my training dataset. As you can see if my Dataset class and data loader below:

class ImageDataset(Dataset):
        def __init__(self, root_dir, dimension=256, augment=False):
            self.root_dir = root_dir # Define root directory path.
            self.folders = os.listdir(root_dir)
            self.augment = augment
            self.dimension = dimension # Define HxW length.
            
        def transform(self, image, mask):
            
            resize = transforms.Resize(size=(self.dimension,self.dimension))
            image = resize(image) # Resize image using (dimension,dimension).
            mask = resize(mask) # Resize mask using (dimension,dimension).

            if self.augment == True:
                # Rotate
                angle = random.randint(-25, 25) # Rotate image and mask using random angle between -25 and 25.
                image = TF.rotate(image, angle)
                mask = TF.rotate(mask, angle)

                # Randomly Horizontally Flip
                if random.random() >= 0.5: # Flip if P >= 0.5.
                    image = TF.hflip(image)
                    mask = TF.hflip(mask)

                # Randomly Vertically Flip
                if random.random() >= 0.5: # Flip if P >= 0.5.
                    image = TF.vflip(image)
                    mask = TF.vflip(mask)

                # Transform to Tensor
                image = TF.to_tensor(image) # Scale pixels from [0,255] to [0,1]. # Scale pixels from [0,255] to [0,1].
                mask = TF.to_tensor(mask) # Scale pixels from [0,255] to [0,1].

            else:
                # Transform to Tensor
                image = TF.to_tensor(image) # Scale pixels from [0,255] to [0,1].
                mask = TF.to_tensor(mask) # Scale pixels from [0,255] to [0,1].

            return image, mask
        
        def __len__(self):
            return len(self.folders)
              
        def __getitem__(self, idx, dimension=256):
            image_folder = os.path.join(self.root_dir, self.folders[idx], 'images/')
            mask_folder = os.path.join(self.root_dir, self.folders[idx], 'masks/')
            image_path = os.path.join(image_folder, os.listdir(image_folder)[0])
            image = Image.open(image_path) # Create PIL image.
            image = np.array(image)
            image = image[:,:,:3] # Select 1,2,3 image channels.
            image = Image.fromarray(image) # Convert numpy array to PIL image.
            
            mask = np.zeros((self.dimension,self.dimension,1), dtype=np.bool)
            for sub_mask in os.listdir(mask_folder):
                    sub_mask = io.imread(os.path.join(mask_folder, sub_mask))
                    sub_mask = transform.resize(sub_mask, (self.dimension,self.dimension))
                    sub_mask = np.expand_dims(sub_mask, axis=-1)
                    mask = np.maximum(mask, sub_mask)
      
            mask = np.squeeze(mask, axis=2)
            mask = Image.fromarray(mask) # Create PIL image using array.
            
            image, mask = self.transform(image, mask) # Apply Transforms.
        
            return (image, mask)

train_dir = 'data/stage1_train/'
dataset = ImageDataset(train_dir, augment=False)

train_size = int(len(dataset) * 0.8)
validation_size = int(len(dataset) * 0.2)
train_dataset, validation_dataset = random_split(dataset, [train_size, validation_size])

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=16, num_workers=0, shuffle=True)
validation_loader = torch.utils.data.DataLoader(validation_dataset, batch_size=16, num_workers=0)

I use my Dataset class to load data and apply a series of transformations to the image and mask using augment=False:

dataset = ImageDataset(train_dir, augment=False)

After loading my training dataset I have kept 20% of the training data for validation. However, I have already augmented this data as the Dataset class has been augmented as part of the training set.

When trying to think of solutions, I considered loading the training data via a function and defining my dataset using image and mask arrays as arguments, and splitting these prior to using a dataset class. Then I can use a the transformations freely on the training split and not apply them to the validation split. I was wondering if there was a more elegant way to approach this problem.

Dwight_Foster · February 12, 2021, 11:25pm

I think the best way is probably like you said defining two datasets one for validation and one for your training data.