Custom BatchSampler for two-step mini-batch

Hi All,

In the data preparation phase for my network, I read an image one at a time, and then, I want to extract several patches from this image as my mini-batch. In other words, the data preparation consists of two steps: 1) read an image and 2) extract random patches to form the mini-match.

What’s the proper way to use BatchSampler to implement this?

Thanks,
Saeed

You could just sample in __getitem__ and stack the patches into the batch dimension:
Here is a small example sampling 5x5 patches. These patches are returned as a 4-dimensional tensor from __getitem__. During the training you could push these patches into the batch dimension with view.

class MyDataset(Dataset):
    def __init__(self):
        self.data = torch.randn(100, 3, 24, 24)
        
    def __getitem__(self, index):
        # Get current image
        image = self.data[index]
        # Sample patches
        patches = self.sample_pathes(image)
        
        return patches
    
    def sample_pathes(self, image):
        # Your sampling logic
        size = 5
        patches = []
        for i in range(5):
            patch = image[:, i:i+size, i:i+size]
            patches.append(patch)
        patches = torch.stack(patches)
        
        return patches
    
    def __len__(self):
        return len(self.data)


dataset = MyDataset()

loader = DataLoader(
    dataset,
    batch_size=10,
    shuffle=False,
    num_workers=2
)

loader_iter = iter(loader)
x = loader_iter.next()
x = x.view(-1, 3, 5, 5)
1 Like

@ptrblck i have this requirement
Out of Big N samples I want to randomly select M samples every epoch for a batch. No weighted criteria
How can we do that ?

If you want to create batches from a subset M of the original dataset with N samples, you could use torch.utils.data.Subset and pass the desired indices to this class.

@ptrblck I need to randomly select M samples every epoch.so M should always get changed.
Say I dataframe Df =1M rows
I want to select 500k rows randomly all the time.
Do I need some thing like this ,that is my own sampler

class YourSampler(Sampler):
    def __init__(self, mask):
        self.mask = mask

    def __iter__(self):
        return (self.indices[i] for i in torch.nonzero(self.mask))

    def __len__(self):
        return len(self.mask)

Couldn’t you recreate the Subset in each epoch and pass it to a new DataLoader again?

@ptrblck
I use now Weighted Random Sampler in desired ration with No of samples Drawn as Half . which is M.
I assume every epoch WRS will draw new pool of data as per Ratio given