Applying transformations for a train and test data

Hi, I am using Cats and Dogs dataset by microsoft. The dataset consists of 25,000 images where all the images of cats contain in cats folder and all the images of dogs contain in the Dogs folder.
However, I split the data into train,valid and test data as given below:

data_transf = transforms.Compose([
transforms.CenterCrop(233),
transforms.ToTensor(),
transforms.Normalize(mean = [0.485, 0.456, 0.406],
std = [0.229, 0.224, 0.225])])

train_data = datasets.ImageFolder(root= ‘/content/PetImages’, transform = data_transf)
valid_size = 0.2
test_size = 0.1

num_train = len(train_data)
indices = list(range(num_train))
np.random.shuffle(indices)
split = int(np.floor(valid_size * num_train))
train_idx, valid_idx = indices[split:], indices[:split]

define samplers for obtaining training and validation batches

train_sampler = SubsetRandomSampler(train_idx)
valid_sampler = SubsetRandomSampler(valid_idx)

num_train = len(train_sampler)
indices = list(range(num_train)
np.random.shuffle(indices)
split = int(np.floor(test_size * num_train))
train_idx, test_idx = indices[split:], indices[:split]
train_sampler = SubsetRandomSampler(train_idx)
test_sampler = SubsetRandomSampler(test_idx)

prepare data loaders

trainloader = DataLoader(dataset = train_data, batch_size=32, drop_last=True,sampler=train_sampler)
validloader = DataLoader(dataset = train_data, batch_size=32,drop_last=True,sampler=valid_sampler)
testloader = DataLoader(dataset = train_data, batch_size=32,drop_last=True,sampler=test_sampler)

Can anyone help me in applying data augmentation techniques to train data and valid data without applying it on test data?

You could create multiple ImageFolders for training, validation, and testing, and pass the corresponding transformation to each of these.
Once these datasets are created, you could reuse the SubsetRandomSampler approach and create the DataLoaders.
Since ImageFolder will load the data lazily, you shouldn’t see a slowdown or any other performance hit.

1 Like