For segmentation how to perform data augmentation in Pytorch?

ahmediqbal · July 16, 2020, 5:34pm

I am using PyTorch for semantic segmentation, But I am facing a problem, because I am use images , and their masks/labels . I want to perform data augmentation such as RandomHorizontalFlip, and RandomCrop, etc.

Here is my code, please check and let me know, how I can embed the following operations in the provided code.

import torchvision.transforms.functional as F

class ToTensor(object):
def call(self, sample):
image, label = sample[‘image’], sample[‘label’]
return {‘image’: F.to_tensor(image), ‘label’: F.to_tensor(label)}

my_transform = transforms.Compose([ ToTensor() ])

dataset = Mydataset(image_dir, label_dir, transform = my_transform)

Print dataset output

dataset[1]

Output

{‘image’: tensor([[[0.0824, 0.0824, 0.0824, …, 0.0824, 0.0824, 0.0824],
[0.0824, 0.0824, 0.0824, …, 0.0824, 0.0824, 0.0824],
[0.0824, 0.0824, 0.0824, …, 0.0824, 0.0824, 0.0824],
…,
[0.0902, 0.0902, 0.0902, …, 0.0824, 0.0824, 0.0824],
[0.0824, 0.0824, 0.0824, …, 0.0824, 0.0824, 0.0824],
[0.0745, 0.0745, 0.0745, …, 0.0824, 0.0824, 0.0824]],
     [[0.0824, 0.0824, 0.0824,  ..., 0.0824, 0.0824, 0.0824],
      [0.0824, 0.0824, 0.0824,  ..., 0.0824, 0.0824, 0.0824],
      [0.0824, 0.0824, 0.0824,  ..., 0.0824, 0.0824, 0.0824],
      ...,
      [0.0902, 0.0902, 0.0902,  ..., 0.0824, 0.0824, 0.0824],
      [0.0824, 0.0824, 0.0824,  ..., 0.0824, 0.0824, 0.0824],
      [0.0745, 0.0745, 0.0745,  ..., 0.0824, 0.0824, 0.0824]],

     [[0.0824, 0.0824, 0.0824,  ..., 0.0824, 0.0824, 0.0824],
      [0.0824, 0.0824, 0.0824,  ..., 0.0824, 0.0824, 0.0824],
      [0.0824, 0.0824, 0.0824,  ..., 0.0824, 0.0824, 0.0824],
      ...,
      [0.0902, 0.0902, 0.0902,  ..., 0.0824, 0.0824, 0.0824],
      [0.0824, 0.0824, 0.0824,  ..., 0.0824, 0.0824, 0.0824],
      [0.0745, 0.0745, 0.0745,  ..., 0.0824, 0.0824, 0.0824]]]),
‘label’: tensor([[[0.0824, 0.0824, 0.0824, …, 0.0824, 0.0824, 0.0824],
[0.0824, 0.0824, 0.0824, …, 0.0824, 0.0824, 0.0824],
[0.0824, 0.0824, 0.0824, …, 0.0824, 0.0824, 0.0824],
…,
[0.0902, 0.0902, 0.0902, …, 0.0824, 0.0824, 0.0824],
[0.0824, 0.0824, 0.0824, …, 0.0824, 0.0824, 0.0824],
[0.0745, 0.0745, 0.0745, …, 0.0824, 0.0824, 0.0824]]])}

ahmediqbal · July 17, 2020, 3:58pm

Please someone answer

Usama_Hasan · July 18, 2020, 2:49pm

Hy there, One way to perform transformation on both data and label is to write your own custom Dataset Class.


class CustomDataset(Dataset):
    def __init__(self , tensors , transform = None ):
        assert (tensor.size(0) == tensors[0].size(0) for tensor in tensors)
        self.tensors = tensors
        self.transform = transform
    def __getitem__(self,index):
        x = self.tensors[0][index]  
        y = self.tensors[1][index]
        if self.transform:
            x = self.transform(x) 
            y = self.transform(y)      
        return x , y

    def __len__(self):
        return len(self.tensors[0])  

# Flip tensor horizontally
def hflip(tensor):
    tensor = tensor.flip(2)
    return tensor

traindataset = CustomTensorDataset(tensors=(X_train, y_train), transform=vflip)

21fa417e3fb06e56040c · July 18, 2020, 2:57pm

Hi. You see here use case transform image argumentation

Sample code

github.com

pytorch/vision/blob/d481f2d88f16c6a6c1f143629876bf536072921b/references/segmentation/train.py#L29


        "voc": ('/datasets01/VOC/060817/', torchvision.datasets.VOCSegmentation, 21),
        "voc_aug": ('/datasets01/SBDD/072318/', sbd, 21),
        "coco": ('/datasets01/COCO/022719/', get_coco, 21)
    }
    p, ds_fn, num_classes = paths[name]

    ds = ds_fn(p, image_set=image_set, transforms=transform)
    return ds, num_classes


def get_transform(train):
    base_size = 520
    crop_size = 480

    min_size = int((0.5 if train else 1.0) * base_size)
    max_size = int((2.0 if train else 1.0) * base_size)
    transforms = []
    transforms.append(T.RandomResize(min_size, max_size))
    if train:
        transforms.append(T.RandomHorizontalFlip(0.5))
        transforms.append(T.RandomCrop(crop_size))

Example transform funtions

github.com

pytorch/vision/blob/master/references/segmentation/transforms.py

import numpy as np
from PIL import Image
import random

import torch
from torchvision import transforms as T
from torchvision.transforms import functional as F


def pad_if_smaller(img, size, fill=0):
    min_size = min(img.size)
    if min_size < size:
        ow, oh = img.size
        padh = size - oh if oh < size else 0
        padw = size - ow if ow < size else 0
        img = F.pad(img, (0, 0, padw, padh), fill=fill)
    return img


class Compose(object):

This file has been truncated. show original

PabloVD · September 29, 2022, 4:41pm

Here is what I do for data augmentation in semantic segmentation.

First I define a composed transform such as

transf_aug = tf.Compose([tf.RandomHorizontalFlip(), tf.RandomResizedCrop((height,width),scale=(0.7, 1.0))])

Then, during the training phase, I apply the transformation at each image and mask. Given that each time transf_aug is applied it is a different random transformation, to ensure that the same transformation is applied to the image and the mask, we do the following trick (based on this comment):

state = torch.get_rng_state()
img = augmentdata(img)
torch.set_rng_state(state)
mask = augmentdata(mask)

Hope it is useful,
Pablo.