My training data has a size pf 72200, and I want to split it into 5 portions so that I can feed it in without maxing out the RAM during training. I have split my data using subsetRandomSampler, but when I check the lengths of the subsets, its not 14440 (72200), its 57.
print(train_loader.__len__())
num_partions = 5
total = train_loader.__len__()
partitions = []
partitions = [int(i*total/num_partions) for i in range(num_partions+1)]
print(partitions)
for j in range(len(partitions)-1):
indices = range(total)
samples = indices[partitions[j]:partitions[j+1]]
train_sampler = SubsetRandomSampler(samples)
train_loader_p1 = DataLoader(MyDataset(train, 12), batch_size=256, num_workers=8, pin_memory=True,sampler=train_sampler)
print(train_loader_p1.__len__(),total)
This is the output I am getting:
72200
[0, 14440, 28880, 43320, 57760, 72200]
57 72200
57 72200
57 72200
57 72200
57 72200
The subsets have a length of 57 instead of 14440. I have checked the length of samples, and its 14440. Where am I going wrong? Also, is there a better way to do the same using random_split?