Prefetch_factor not prefetching?

wmpauli · May 4, 2022, 12:27am

I have an iterable dataset with the following __iter__() method.

def __iter__(self):
   for track in self.selected_tracks:
      samples = self.load_track(track)
      for sample in samples:
         yield sample

where load_track loads a number of samples, which takes a long time.

Use a standard dataloader, with for sample in dataloader. I’m able to run this with several persistent workers. However, I noticed that in between calls to load_track, all the workers are just sitting there. I thought I could use prefetch_factor with a number larger than the number of samples returned by load_track, but it doesn’t seem to have an effect. Am I understanding prefetch_factor correctly, in that it loads samples for the next iteration on the dataloader?

lkc1 · July 11, 2022, 1:06am

I’m seeing a similar behavior. Changing the prefetch_factor does not change the latency of fetching batches at all.

wmpauli · July 29, 2022, 6:30pm

@lkc1, I did not dig much deeper into this issue, because I realized that the reason the workers were just sitting there was that I was pulling data from a very large SQL table, and they were all just waiting for the data.