In the data preparation phase for my network, I read an image one at a time, and then, I want to extract several patches from this image as my mini-batch. In other words, the data preparation consists of two steps: 1) read an image and 2) extract random patches to form the mini-match.
What’s the proper way to use BatchSampler to implement this?
You could just sample in __getitem__ and stack the patches into the batch dimension:
Here is a small example sampling 5x5 patches. These patches are returned as a 4-dimensional tensor from __getitem__. During the training you could push these patches into the batch dimension with view.
class MyDataset(Dataset):
def __init__(self):
self.data = torch.randn(100, 3, 24, 24)
def __getitem__(self, index):
# Get current image
image = self.data[index]
# Sample patches
patches = self.sample_pathes(image)
return patches
def sample_pathes(self, image):
# Your sampling logic
size = 5
patches = []
for i in range(5):
patch = image[:, i:i+size, i:i+size]
patches.append(patch)
patches = torch.stack(patches)
return patches
def __len__(self):
return len(self.data)
dataset = MyDataset()
loader = DataLoader(
dataset,
batch_size=10,
shuffle=False,
num_workers=2
)
loader_iter = iter(loader)
x = loader_iter.next()
x = x.view(-1, 3, 5, 5)
@ptrblck i have this requirement
Out of Big N samples I want to randomly select M samples every epoch for a batch. No weighted criteria
How can we do that ?
If you want to create batches from a subset M of the original dataset with N samples, you could use torch.utils.data.Subset and pass the desired indices to this class.
@ptrblck I need to randomly select M samples every epoch.so M should always get changed.
Say I dataframe Df =1M rows
I want to select 500k rows randomly all the time.
Do I need some thing like this ,that is my own sampler
class YourSampler(Sampler):
def __init__(self, mask):
self.mask = mask
def __iter__(self):
return (self.indices[i] for i in torch.nonzero(self.mask))
def __len__(self):
return len(self.mask)
@ptrblck
I use now Weighted Random Sampler in desired ration with No of samples Drawn as Half . which is M.
I assume every epoch WRS will draw new pool of data as per Ratio given