Hi, I have a training set which I want to divide into batches of variable sizes based on the index list (batch 1 would contain data with index 1 to 100, and batch 2 contains index 101 to 129, batch 3 contains index 130 to 135, …, for instance). I check dataloader but it seems to me that it only supports fixed-size batches. I wonder what would be a good way to do that?
Why don’t you shuffle your data and drop the last samples?
Because I want to keep the order fixed, such that a specific batch contains data exactly specified by the index list. For my example above, batch 1 should only contain data with index 1 to 100, not 100 random data points. Same for batch 2,3,…
Do you know these lengths beforehand?
If so, you could use these indices to slice your data, set
view your data to fake your batch size:
self.data = torch.randn(250, 1)
self.batch_indices = [0, 100, 129, 150, 200, 250]
def __getitem__(self, index):
start_idx = self.batch_indices[index]
end_idx = self.batch_indices[index+1]
data = self.data[start_idx:end_idx]
return len(self.batch_indices) - 1
dataset = MyDataset()
loader = DataLoader(
for data in loader:
data = data.view(-1, 1)