I have an iterable dataset with the following __iter__()
method.
def __iter__(self):
for track in self.selected_tracks:
samples = self.load_track(track)
for sample in samples:
yield sample
where load_track
loads a number of samples, which takes a long time.
Use a standard dataloader, with for sample in dataloader
. I’m able to run this with several persistent workers. However, I noticed that in between calls to load_track
, all the workers are just sitting there. I thought I could use prefetch_factor
with a number larger than the number of samples returned by load_track
, but it doesn’t seem to have an effect. Am I understanding prefetch_factor correctly, in that it loads samples for the next iteration on the dataloader?