From what I understand the worker processes of the Dataloader fetch batches instead of fetching samples. Is there a way of fetching samples instead of batches?
Also, when setting
num_workers > 0, by default each worker prefetches 2 samples in advance. I don’t understand exactly how this works. Each worker prepares a batch and reads 2 samples in advance for the next batch? Do the workers prefetch samples for the next training epoch before it starts?
The Dataloader fetches batches so that it can perform all the preprocessing and creation on the batch on the worker process and have as few things as possible to do in the main process once the batch is ready.
Why would you want workers to load samples only?
Each worker prefetches 2 batches in advance to make sure that when the main process asks for the next batch, there is always one ready.
Note that if you use nightly build, you can control that number with the
prefetch_factor argument to the dataloader (doc here: https://pytorch.org/docs/master/data.html#torch.utils.data.DataLoader)
Thank you very much for your answer!
Ok, I now understand why the workers fetch batches instead of samples.
Do the workers fetch batches for the next epoch before it starts, or the batches of an epoch only start being fetched when the epoch starts?
On another subject, I noticed that when I choose
batch_size=64, worker 1 reads the first 64 indexes, worker 2 reads the next 64 indexes, and so on. Is there a way of having workers reading interleaved indexes?
For example, when
num_workers = 3:
- worker 1 reads indexes 1, 4, 7, 10…
- worker 2 reads indexes 2, 5, 8, 11…
- worker 3 reads indexes 3, 6, 9, 12…
It only start with the epoch, when the iterator is created. Meaning, when you do:
for sample in dataloader:
You can specify a
sampler (doc) when you create the Dataloader that is reponsible to draw the samples.
You can control the sampler to force a certain pattern in the content of the batch (and thus what the workers will load).
Ok, I will try that.
Thank you very much for your help!