Data Augmentation after creating Dataset

Bhavya_Soni · May 10, 2021, 1:49pm

Hello there , I’m new to PyTorch, I’ve created a dataset that is having x-ray images and it is transformed but after creating the dataset I’m not getting good test accuracy so i have decided to do augmentation but I don’t know how to do augmentation on already created dataset .

test_loader = data['test_loader']
train_loader = data['train_loader']
train_dataset = data['train_dataset']
test_dataset = data['test_dataset']

Here , I’ve used previously saved tensor data of images and loaded it in train_dataset,test_dataset …

I want to augment directly the train_dataset . can anyone tell me how to do that?

ariG23498 · May 10, 2021, 2:16pm

You can create a Compose of augmentations and then use it in the training loop itslelf.

aug = Compose(<the list of augmentations>)

for x,y in dataloader:
    x_aug = aug(x)

I think this might do the trick.

Bhavya_Soni · May 10, 2021, 3:56pm

But it will overwrite x_aug everytime , at the end of loop only last batch will be augmented , I guess.

Bhavya_Soni · May 10, 2021, 4:13pm

aug = transforms.Compose([transforms.RandomHorizontalFlip(1),
               transforms.RandomAffine(20)])
temp_list = []
for x,y in train_loader:
    temp_list.append(aug(x))
final_tensor = torch.cat(temp_list)

This way we can do that , is there ant efficient way? without using list .
Thank you for your suggestion.

ariG23498 · May 10, 2021, 4:23pm

Hey @Bhavya_Soni
Why would you concatenate the augmented tensors?
It would be efficient to augment and send the augmented tensors through a model and learn the weights.

The only problem that I see here is that the Random_ augmentation will augment each image in a batch the same way, which is not an ideal setup.

ejguan · May 10, 2021, 4:25pm

First, you can set batch_size for the DataLoader to load multiple data into a batch and do transform or augmentation for each batch.
Secondly, I am not sure why you have this tmp_list. You can directly use the augmented data to train your model.

for x, y in train_loader:
    aug_x = aug(x)
    ...
    output = model(aug_x)
    loss = ...
    loss.backward()
    ...
``

Bhavya_Soni · May 10, 2021, 4:32pm

Oh yes. I did augmentation first, of whole dataset randomly and then started training , I get this now . Thank you very much.

Bhavya_Soni · May 10, 2021, 4:33pm

Yes sir, I got it now , thank you very much.

I’m beginner

saeid_rasouli · January 5, 2023, 10:17am

For anyone who faced this problem, this is the solution: Pytorch, apply different transform to dataset after creation

karar · January 5, 2023, 12:19pm

Thank you for the helpful link: Pytorch, apply different transform to dataset after creation