Where and how does prefetch_factor work?

a simple trainning
image
During the training, i found that there will be a long wait every other period of time, which corresponds to the value of num_workers.In dataloader, prefetch_factor is 2, i think the cycle should be prefetch_factor * num_workers

I commented out the calculation process in the picture 1, and the phenomenon is more obvious

i find the part of source code of dataloader
image

image
im confused
where does prefetch_factor work?
work in the whole time or just in the initial of the dataloader as the picture above shows

What is the number of workers here and have you tried increasing it? I’m curious if the number of workers isn’t enough to keep up with the device’s training throughput; note that if the selected number of workers cannot keep up in terms of throughput, prefetching alone cannot address the underlying issue as the workers will always be busy.