For segmentation how to perform data augmentation in Pytorch?

I am using PyTorch for semantic segmentation, But I am facing a problem, because I am use images , and their masks/labels . I want to perform data augmentation such as RandomHorizontalFlip, and RandomCrop, etc.

Here is my code, please check and let me know, how I can embed the following operations in the provided code.

import torchvision.transforms.functional as F

class ToTensor(object):
def call(self, sample):
image, label = sample[‘image’], sample[‘label’]
return {‘image’: F.to_tensor(image), ‘label’: F.to_tensor(label)}

my_transform = transforms.Compose([ ToTensor() ])

dataset = Mydataset(image_dir, label_dir, transform = my_transform)

Print dataset output

dataset[1]

Output

{‘image’: tensor([[[0.0824, 0.0824, 0.0824, …, 0.0824, 0.0824, 0.0824],
[0.0824, 0.0824, 0.0824, …, 0.0824, 0.0824, 0.0824],
[0.0824, 0.0824, 0.0824, …, 0.0824, 0.0824, 0.0824],
…,
[0.0902, 0.0902, 0.0902, …, 0.0824, 0.0824, 0.0824],
[0.0824, 0.0824, 0.0824, …, 0.0824, 0.0824, 0.0824],
[0.0745, 0.0745, 0.0745, …, 0.0824, 0.0824, 0.0824]],

     [[0.0824, 0.0824, 0.0824,  ..., 0.0824, 0.0824, 0.0824],
      [0.0824, 0.0824, 0.0824,  ..., 0.0824, 0.0824, 0.0824],
      [0.0824, 0.0824, 0.0824,  ..., 0.0824, 0.0824, 0.0824],
      ...,
      [0.0902, 0.0902, 0.0902,  ..., 0.0824, 0.0824, 0.0824],
      [0.0824, 0.0824, 0.0824,  ..., 0.0824, 0.0824, 0.0824],
      [0.0745, 0.0745, 0.0745,  ..., 0.0824, 0.0824, 0.0824]],

     [[0.0824, 0.0824, 0.0824,  ..., 0.0824, 0.0824, 0.0824],
      [0.0824, 0.0824, 0.0824,  ..., 0.0824, 0.0824, 0.0824],
      [0.0824, 0.0824, 0.0824,  ..., 0.0824, 0.0824, 0.0824],
      ...,
      [0.0902, 0.0902, 0.0902,  ..., 0.0824, 0.0824, 0.0824],
      [0.0824, 0.0824, 0.0824,  ..., 0.0824, 0.0824, 0.0824],
      [0.0745, 0.0745, 0.0745,  ..., 0.0824, 0.0824, 0.0824]]]),

‘label’: tensor([[[0.0824, 0.0824, 0.0824, …, 0.0824, 0.0824, 0.0824],
[0.0824, 0.0824, 0.0824, …, 0.0824, 0.0824, 0.0824],
[0.0824, 0.0824, 0.0824, …, 0.0824, 0.0824, 0.0824],
…,
[0.0902, 0.0902, 0.0902, …, 0.0824, 0.0824, 0.0824],
[0.0824, 0.0824, 0.0824, …, 0.0824, 0.0824, 0.0824],
[0.0745, 0.0745, 0.0745, …, 0.0824, 0.0824, 0.0824]]])}

Please someone answer :frowning:

Hy there, One way to perform transformation on both data and label is to write your own custom Dataset Class.


class CustomDataset(Dataset):
    def __init__(self , tensors , transform = None ):
        assert (tensor.size(0) == tensors[0].size(0) for tensor in tensors)
        self.tensors = tensors
        self.transform = transform
    def __getitem__(self,index):
        x = self.tensors[0][index]  
        y = self.tensors[1][index]
        if self.transform:
            x = self.transform(x) 
            y = self.transform(y)      
        return x , y

    def __len__(self):
        return len(self.tensors[0])  

# Flip tensor horizontally
def hflip(tensor):
    tensor = tensor.flip(2)
    return tensor

traindataset = CustomTensorDataset(tensors=(X_train, y_train), transform=vflip)
1 Like

Hi. You see here use case transform image argumentation

Sample code

Example transform funtions

Here is what I do for data augmentation in semantic segmentation.

First I define a composed transform such as

transf_aug = tf.Compose([tf.RandomHorizontalFlip(), tf.RandomResizedCrop((height,width),scale=(0.7, 1.0))])

Then, during the training phase, I apply the transformation at each image and mask. Given that each time transf_aug is applied it is a different random transformation, to ensure that the same transformation is applied to the image and the mask, we do the following trick (based on this comment):

state = torch.get_rng_state()
img = augmentdata(img)
torch.set_rng_state(state)
mask = augmentdata(mask)

Hope it is useful,
Pablo.

1 Like