Iterable Dataset from multiple different sequences

I am trying to figure out how to write custom iterable dataloader with multiple folders of different sequences. The issue i am currently facing is to dis-jointly return different sequences.

For example: Lets take two sequences (folders) with number of images 340 and 500, with a batch size of 32. when iterating over first my last batch should have 20 images similar to drop_last and continue with the next folder.

import torch
from import DataLoader, IterableDataset

class myiter(IterableDataset):

  def __init__(self, data_dir, sequences):
    self.data_dir = data_dir
    self.seqs = sequences
    self.img_paths = []
    for seq in self.seqs:
        self.img_paths.append(sorted(os.listdir(os.path.join(self.img_dir, seq)))

  def process_data(self):
    for seq in self.img_paths:
      for image in seq:
         yield img

  def __iter__(self):
    return self.process_data()

from the above dummy script i am able to iterate continuously but the sequences are different from each other.

@ptrblck cloud you provide me any leads on this. Thank you.

I think writing a custom BatchSampler using the “split logic” might be the easiest approach to be able to yield batches from both datasets but make sure that the samples are not overlapping.

Noted @ptrblck. I will get back to you if I need any further assistance. Thank you.