Recently in my project, I need to go deeper on how Dataloader
works.
I know that when setting num_workers>0
can make loading data into RAM faster,
BUT when using batch_size=1
why the loading speed still follows num_workers=4
>num_workers=2
>num_workers=0
?
The dataset defines commonly as:
def __getitem__(self, index):
path=get_datafile(index)
data=pil_loader(path)
return data, get_target(index)
Are workers caching data into RAM when num_workers>0
and how do they know which index
to cache?
Or one sample (batch_size
=1) is loaded by several workers?