I am trying to figure out how to write custom iterable dataloader with multiple folders of different sequences. The issue i am currently facing is to dis-jointly return different sequences.
For example: Lets take two sequences (folders) with number of images 340 and 500, with a batch size of 32. when iterating over first my last batch should have 20 images similar to drop_last and continue with the next folder.
import torch
from torch.utils.data import DataLoader, IterableDataset
class myiter(IterableDataset):
def __init__(self, data_dir, sequences):
self.data_dir = data_dir
self.seqs = sequences
self.img_paths = []
for seq in self.seqs:
self.img_paths.append(sorted(os.listdir(os.path.join(self.img_dir, seq)))
def process_data(self):
for seq in self.img_paths:
for image in seq:
yield img
def __iter__(self):
return self.process_data()
from the above dummy script i am able to iterate continuously but the sequences are different from each other.