Understand data augmentation in PyTorch

Hi,

I was wondering if I could get a better understanding of data Augmentation in PyTorch. From what I know, data augmentation is used to increase the number of data points when we are running low on them. So we use transforms to transform our data points into different types. I am suing data transformation like this:

transform_img = transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        # transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
class dataload(Dataset):

    def __init__(self, x, transform=None):
        self.data = x
        self.transform = transform

    def __len__(self):
        return len(self.data)

    def __getitem__(self, i):

        img = Image.open(self.data[i])
        # img = img.transpose((2, 0, 1))
        # img = torch.from_numpy(img).float()
        tmp = np.int32(filenames[i].split('/')[-1].split('_')[0][1])
        label = np.zeros(67)
        label[tmp] = 1
        label = torch.from_numpy(label).float()

        if self.transform:
            img = self.transform(img)

        return img,label
train_dataloader = dataload(filenames, transform=transform_img)

Now, it seems to work but I don’t get one thing. It does the transformation but it doesn’t increase the number of data points. I was hoping that each label would have 2 extra images since we are doing that transformation. But it doesn’t seem to do that. The total number of training samples is still the same.

So am I getting something wrong about augmentation or have I implemented this in the wrong way?

1 Like

This way i’d call it alteration, not augmentation. Augmentation is when you are creating additional training samples. You need to move transformations to init, transform all x’es and add result to original data.

Also take a look at timm library for the augmentations, cutmix and mixup implementations helped me a lot in recent project.

I want to create more training samples because I have little training data. I have 2000 images so I was looking to increase that to 4 times through augmentation.

Right, but in your code you’re not creating additional samples, you’re modifying existing ones.
(I just realized your x (or self.data) in init actually paths to files, so you can’t transform x in init without additionally saving transformed images).

Then the easiest way here is to run multiple training loops. One based on dataset without transform, another one based on dataset with first transform. then another one etc.

So there any direct approach to data augmentation that it does in one go, like keras?

I don’t think keras does something different.

Another option I can think of is if you change structure of self.data to have not only file paths, but also what kind of augmentation (if there is one) should be applied during get_init.

When I worked with Keras sometime back, it used to increase the training data. I don’t know if things have changed.

I don’t know if this will help me increase the training data. It would only return the transformed image and not multiple images. I have written a code and I was wondering if you can tell me if this is what you meant by running in loop here:

    for epoch in range(num_epochs):
        running_loss_train = 0.0
        running_corrects_train, running_corrects_test = 0, 0
        for _ in tqdm(range(10)):
            # print('Epoch {}/{}'.format(epoch, num_epochs - 1))
            # print('-' * 10)
            # Each epoch has a training and validation phase
            model.train()  # Set model to training mode
            # Iterate over data.
            running_loss_aug = 0.0
            running_corrects_train_aug, running_corrects_test_aug = 0, 0
            for inputs, labels in (trainloader):
                inputs = inputs.cuda(0)
                labels = labels.cuda(1)
                # print(inputs, labels)
                # zero the parameter gradients
                optimizer.zero_grad()
                # forward
                # track history if only in train
                with torch.set_grad_enabled(True):
                    outputs = model(inputs)
                    # print(outputs)
                    
                    loss = criterion(outputs, labels)
                    # backward + optimize only if in training phase
                    loss.backward()
                    optimizer.step()
                # statistics
                running_loss_train += loss.item() * inputs.size(0)
                running_loss_aug += loss.item() * inputs.size(0)
                count=0
                for i in range(inputs.size(0)):
                    if(torch.argmax(outputs[i]) == torch.argmax(labels.data[i])):
                        count+=1
                running_corrects_train += count
                running_corrects_train_aug += count
            running_loss_aug = running_loss_aug / len(trainloader.dataset)
            running_corrects_train_aug = running_corrects_train_aug / len(trainloader.dataset)
            with torch.no_grad():
                model.eval()
                for inputs, labels in (testloader):

                    # Generate outputs
                    outputs = model(inputs)
                    count = 0
                    for i in range(inputs.size(0)):
                        if(torch.argmax(outputs[i]) == torch.argmax(labels.data[i])):
                            count+=1
                    running_corrects_test_aug += count
                    running_corrects_test += count
                epoch_acc_test = running_corrects_test_aug / len(testloader.dataset)
                acc_data_test.append(epoch_acc_test)
            # print('After each train: Train_Loss: {:.4f} Train_Acc: {:.4f} Test_Acc: {:.4f}'.format(
            #     running_loss_aug, running_corrects_train_aug, epoch_acc_test))
        scheduler.step()
        epoch_loss = running_loss_train / 20100.
        epoch_acc_train = running_corrects_train / 20100.
        epoch_acc_test = running_corrects_test / 20100
        loss_data_train.append(epoch_loss)
        acc_data_train.append(epoch_acc_train)

Let me show you what I mean:

class dataload(Dataset):

    def __init__(self, x, transform=None):
        self.data = [(el, 'none') for el in x]
        self.data.extend([(el, 'augm1') for el in x])
        self.transform = transform

    def __len__(self):
        return len(self.data)

    def __getitem__(self, i):

        img = Image.open(self.data[i][0])
        # img = img.transpose((2, 0, 1))
        # img = torch.from_numpy(img).float()
        tmp = np.int32(filenames[i].split('/')[-1].split('_')[0][1])
        label = np.zeros(67)
        label[tmp] = 1
        label = torch.from_numpy(label).float()
    
        if self.data[i][1] == 'augm1' and self.transform:
            img = self.transform(img)

        return img,label

This way self.data stores twice as much records: one with augm1 flag and will trigger augmentation, another won’t.

2 Likes

This did double the training data. Thanks. However, I have a question. Since I am using these transforms:

transform_img = transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

So shouldn’t the increase be 4x?

Also, I made a change to the code because it was returning PIL Image type but non-transformed images:

class dataload(Dataset):

    def __init__(self, x, transform=None):
        self.data = [(el, 'none') for el in x]
        self.data.extend([(el, 'augm1') for el in x])
        self.transform = transform

    def __len__(self):
        return len(self.data)

    def __getitem__(self, i):


        img = mpimg.imread(self.data[i][0]) - [103.939, 116.779, 123.68]
        img = img.transpose((2, 0, 1))
        img = torch.from_numpy(img).float()
        tmp = np.int32(filenames[i].split('/')[-1].split('_')[0][1])
        label = np.zeros(67)
        label[tmp] = 1
        label = torch.from_numpy(label).float()
    
        if self.data[i][1] == 'augm1' and self.transform:
            img = Image.open(self.data[i][0])
            img = self.transform(img)

        return img,label

Let me know what you think.

Nope, these are not separate transformations but single sequence

So is it possible that I get a separate transformation for each image?

You can create multiple different transform sequences, then extend self.data few more times with different augmentation flags and run different transforms depending on value of the flag.

Soemthing like this?

def __init__(self, x, transform=None):
        self.data = [(el, 'none') for el in x]
        self.data.extend([(el, 'augm1') for el in x])
        self.data.extend([(el, 'augm2') for el in x])
        self.data.extend([(el, 'augm3') for el in x])
        self.transform = transform

and then

if self.data[i][1] == 'augm1' and self.transform:
            img = Image.open(self.data[i][0])
            img = self.transform(img)
if self.data[i][2] == 'augm2' and self.transform:
            img = Image.open(self.data[i][0])
            img = self.transform(img)
if self.data[i][3] == 'augm3' and self.transform:
            img = Image.open(self.data[i][0])
            img = self.transform(img)

yup, just one thing - you are applying same transform every time. You need to define additional transform sequences, for example one with horizontal flip and another with resize and crop.

transform_crop = transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

also I would still advise you to look into timm’s library of augmentations. Cutmix and Mixup helped me a lot lately, accuracy went up from 94.5% to 99.37%

1 Like

Can you elaborate on that? Because I have create something like that as mentioned here

I’ll definitely checkout the library.

def __init__(self, x):
        self.data = [(el, 'none') for el in x]
        self.data.extend([(el, 'augm1') for el in x])
        self.data.extend([(el, 'augm2') for el in x])
        self.data.extend([(el, 'augm3') for el in x])
        self.transform1 = transforms.Compose([
                transforms.RandomResizedCrop(224),
                transforms.ToTensor(),
                transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ])
        self.transform2 = transforms.Compose([
                transforms.RandomHorizontalFlip(),
                transforms.ToTensor(),
                transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ])
        self.transform3 = transforms.Compose([
                transforms.RandomErasing(),
                transforms.ToTensor(),
                transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ])

and

if self.data[i][1] == 'augm1':
      # do self.transform1
if self.data[i][1] == 'augm2':
      # do self.transform2
if self.data[i][1] == 'augm3':
      # do self.transform3
1 Like

Ah. Thanks a lot. I will try this.

Hi I am trying to understand what you suggest @my3bikaht
and I want to apply it to my dataset I am working with the faceLandmarks dataset in th official pytorch exmple and instead of making transformation on the fly at every epoch I want make additional data lets say the original one and the transformed one and use both of them for the training
But I got confused how to do it
this is what I tried

class FaceLandmarksDataset(Dataset):

   def __init__(self, csv_file, root_dir, transform=None):
    """
    Arguments:
        csv_file (string): Path to the csv file with annotations.
        root_dir (string): Directory with all the images.
        transform (callable, optional): Optional transform to be applied
            on a sample.
    """
    self.landmarks_frame = pd.read_csv(csv_file)
    self.root_dir = root_dir
    self.transform = transform

  def __len__(self):
    return len(self.landmarks_frame)

  def __getitem__(self, idx):
    if torch.is_tensor(idx):
        idx = idx.tolist()
        
    self.transform = transforms.Compose([Rescale,
            RandomCrop,
            ToTensor()])
    img_name = os.path.join(self.root_dir,
                            self.landmarks_frame.iloc[idx, 0])
    
    image = io.imread(img_name)
    landmarks = self.landmarks_frame.iloc[idx, 1:]
    landmarks = np.array([landmarks], dtype=float).reshape(-1, 2)
    
    # Assign a label indicating 'none' augmentation by default
    augmentation_label = 'none'
    
    # Apply transformation if it exists and is a list with augmentation labels
    if self.transform:
        if isinstance(self.transform, list):
            augmentation_label = self.transform[idx % len(self.transform)]
    
    sample = {'image': image, 'landmarks': landmarks, 'augmentation_label': augmentation_label}

    if self.transform and augmentation_label != 'none':
        sample = self.transform(sample)
    sample = {'image': image, 'landmarks': landmarks}

    return sample

  # dataloader for the dataset ---------------------------------
transformed_dataset =FaceLandmarksDataset(csv_file='C:/Users/abir/Desktop/faces_dataset/faces/faces/face_landmarks.csv',root_dir='C:/Users/abir/Desktop/faces_dataset/faces/faces/')
                                      
dataloader = DataLoader(transformed_dataset, batch_size=4,
                    shuffle=True, num_workers=0)

then I got confused how to load the original and transformed data with the dataloader ,any suggestions ?