I have an iterable dataset with the following
def __iter__(self): for track in self.selected_tracks: samples = self.load_track(track) for sample in samples: yield sample
load_track loads a number of samples, which takes a long time.
Use a standard dataloader, with
for sample in dataloader. I’m able to run this with several persistent workers. However, I noticed that in between calls to
load_track, all the workers are just sitting there. I thought I could use
prefetch_factor with a number larger than the number of samples returned by
load_track, but it doesn’t seem to have an effect. Am I understanding prefetch_factor correctly, in that it loads samples for the next iteration on the dataloader?