Using only subset of images for training dataloader

shivangi · August 9, 2019, 4:27pm

I am using pytorch dataloaders for loading my data. I have 2 folders , one for real and other for fake (1000 images each) in the folder.
Here is my current code.

import PIL.ImageOps
from torch.utils.data import Dataset
import torchvision.datasets as dset
from torchvision import transforms
import numpy as np
import cv2

class CustomData(Dataset):
    """
    CustomData dataset
    """

    def __init__(self, name, base_path, transform=None, should_invert=False):
        super(Dataset, self).__init__()
        self.base_path = base_path
        self.inputFolderDataset = dset.ImageFolder(root=self.base_path + '/')
        self.transform = transform
        self.should_invert = should_invert
        self.to_tensor = transforms.ToTensor()

    def __getitem__(self, index):
        # Training input images
        input_images = self.inputFolderDataset.imgs
        # Assign label to class
        # 0 for original, 1 for fake
        input_images = [(t[0], 0) if "orig" in t[0] else (t[0], 1) for t in input_images]
        input_img = cv2.imread(input_images[index][0])
        input_img = np.array(input_img, dtype='uint8')
        input_img = cv2.cvtColor(input_img, cv2.COLOR_BGR2RGB)

        if self.should_invert:
            input_img = PIL.ImageOps.invert(input_img)

        if self.transform is not None:
            input_img_as_tensor = self.transform(input_img)
        else:
            input_img_as_tensor = self.to_tensor(input_img)

        return input_img_as_tensor, input_images[index][1]

    def __len__(self):
        return len(self.inputFolderDataset.imgs)

I have 2 doubts:

I want to do something like few-shot learning (1-shot, 2-shot, 3-shot) and compare the results. Is it possible to load only a subset of data during training (without adding/deleting the images in my folder).
Also lets say if I am doing 1-shot learning and I want to average the results over 10 runs, so for these 10 runs every time I want to randomly select the image that I am using for training. Is it possible to do so ?

Nikronic · August 9, 2019, 4:37pm

Hi,
I do not know about x-shot learning but if you want to get a subset of your dataset, you just need to change the __len__() function to return maximum size you want.
This function return the length of your dataset, so if you put a constant number or a function the current number, you get as much as you wanted.

 def __len__(self):
        return len(self.inputFolderDataset.imgs)/10 # any number will work

About your second question, DataLoader class can solve your problem. It ‘shuffle’ argument which can create random batches of your dataset.

Here is a snippet you can use.

from torch.utils.data import DataLoader
# your CustomData is defined here
custom_data = CustomData(*args)
train_loader = DataLoader(dataset=custom_data,
                          batch_size= 16 # batch size,
                          shuffle=True,  # this line do the random thing
                          num_workers=0)

Based on PyTorch, every epoch, it will generate different random numbers so you’ll get different images.

Good luck