Hi, I have a training set which I want to divide into batches of variable sizes based on the index list (batch 1 would contain data with index 1 to 100, and batch 2 contains index 101 to 129, batch 3 contains index 130 to 135, …, for instance). I check dataloader but it seems to me that it only supports fixed-size batches. I wonder what would be a good way to do that?
Because I want to keep the order fixed, such that a specific batch contains data exactly specified by the index list. For my example above, batch 1 should only contain data with index 1 to 100, not 100 random data points. Same for batch 2,3,…
Do you know these lengths beforehand?
If so, you could use these indices to slice your data, set batch_size=1 and view your data to fake your batch size:
class MyDataset(Dataset):
def __init__(self):
self.data = torch.randn(250, 1)
self.batch_indices = [0, 100, 129, 150, 200, 250]
def __getitem__(self, index):
start_idx = self.batch_indices[index]
end_idx = self.batch_indices[index+1]
data = self.data[start_idx:end_idx]
return data
def __len__(self):
return len(self.batch_indices) - 1
dataset = MyDataset()
loader = DataLoader(
dataset,
batch_size=1,
shuffle=False,
num_workers=2
)
for data in loader:
data = data.view(-1, 1)
print(data.shape)