process the 1st batch extremely slow

We are using lib to load a large # of TFRecord files, the code looks like this:

datapipes = []
for path in paths:
     datapipe = datapipe.open_files_by_fsspec(mode='rb')
    fsspec.asyn.iothread[0] = None
    fsspec.asyn.loop[0] = None
    datapipe = datapipe.decompress(file_type=compression_type)
    datapipe = datapipe.load_from_tfrecord()
    datapipe = datapipe.cycle(num_epochs)
pipes_to_weights_dict = dict(zip(datapipes, dir_weights))
datapipe = SampleMultiplexer(pipes_to_weights_dict)
datapipe =

rs = MultiProcessingReadingService(num_workers=num_parallel_calls, 
                                           worker_prefetch_cnt=prefetch_size // num_parallel_calls,

data_loader = DataLoader2(datapipe, reading_service=rs)

for i, e in enumerate(data_loader):

We found that it takes 10+mins to process the 1st batch. Spending some time adding some debug messages, and found that if I add a print before and inside the loop of this line, it keeps printing the message before the loop, but not inside the loop.

I also tried a dataset with only a few TFRecords, the 1st batch can be processed quickly.

  1. Can I get some help to narrow down the root cause of long processing time for the 1st batch?
  2. Seems is not actively developed, can you share more about the roadmap for Torch data loader? Should we use lib or something else?